Issue 29884571: Issue 6945 - Add script to make filter list diffs

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+176 lines, -3 lines)			Patch
M	README.md	View	1 2 3 4 5	2 chunks	+23 lines, -0 lines	0 comments	Download
A	abp/filters/diff_script.py	View	1 2 3 4 5	1 chunk	+60 lines, -0 lines	0 comments	Download
M	abp/filters/renderer.py	View	1 2 3 4 5	1 chunk	+3 lines, -0 lines	0 comments	Download
M	setup.py	View		1 chunk	+2 lines, -1 line	0 comments	Download
A	tests/test_diff_script.py	View	1 2 3 4 5 6	1 chunk	+85 lines, -0 lines	0 comments	Download
M	tests/test_differ.py	View	1 2 3	3 chunks	+3 lines, -2 lines	0 comments	Download

Messages

Total messages: 20

Expand All Messages | Collapse All Messages

rhowell

Hey Vasily, There are a couple spots that can likely be improved, but I wanted ...

Sept. 18, 2018, 4:01 p.m. (2018-09-18 16:01:54 UTC) #2

Sebastian Noack

https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_script.py File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_script.py#newcode52 abp/filters/diff_script.py:52: latest = open(args.latest, 'r') Please make sure all files ...

Sept. 18, 2018, 4:56 p.m. (2018-09-18 16:56:12 UTC) #3

rhowell

Thanks for the quick feedback! https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_script.py File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_script.py#newcode52 abp/filters/diff_script.py:52: latest = open(args.latest, 'r') ...

Sept. 18, 2018, 5:23 p.m. (2018-09-18 17:23:33 UTC) #4

Sebastian Noack

I also noticed that in render_script.py we don't read files directly from disk but use ...

Sept. 18, 2018, 7:13 p.m. (2018-09-18 19:13:19 UTC) #5

I also noticed that in render_script.py we don't read files directly from disk
but use some abstraction layer (see sources.py). However, I am not familiar with
the purpose of this abstraction, and whether it is relevant when generating
diffs.

https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc...
File abp/filters/diff_script.py (right):

https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc...
abp/filters/diff_script.py:60: open(args.outfile, 'w')
On 2018/09/18 17:23:33, rhowell wrote:
> On 2018/09/18 16:56:12, Sebastian Noack wrote:
> > On 2018/09/18 16:01:54, rhowell wrote:
> > > This feels hacky..
> > 
> > Well, this is redundant. Why did you put it here?
> 
> Wow, you're right. I swore I was getting an error whenever I tried to write to
> this file, that the file didn't exist. I read through the docs for io
> (https://docs.python.org/2/library/io.html), and didn't see an option to
create
> the file if it doesn't exist (Like 'w+' when using `open`). Fixed.

Both, open(..., 'w') and io.open(..., 'w') create the file if it doesn't exist.
However, 'w+' (read+write) doesn't, but there is no need to use this mode here.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
File abp/filters/diff_script.py (right):

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:53: lines = render_diff(base, latest)
Iterating over a file-object yields each line including the line ending
character(s). While for most lines (all except metadata) the lines are stripped
in parse_line(), this is an implementation detail of the parser. So we might
want to pass lines without line endings here.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:56: sys.stdout.write(line + '\n')
Does that work on Python 2 if there are any non-ascii characters?

Vasily Kuznetsov

Hi Rosie, I would say that the overall approach looks right to me. I have ...

Sept. 19, 2018, 6:40 p.m. (2018-09-19 18:40:04 UTC) #6

Hi Rosie,

I would say that the overall approach looks right to me.

I have answered your questions and added a couple of other comments.

Looking forward to seeing this in a more completed form but also feel free to
ask further questions.

Cheers,
Vasily

https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr...
File tests/test_diff_script.py (right):

https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr...
tests/test_diff_script.py:70: run_script(str(rootdir.join('base.txt')),
On 2018/09/18 17:23:33, rhowell wrote:
> Should the script be called like:
> 1) fldiff base=base.txt latest=latest.txt
> 2) fldiff -b base.txt -l latest.txt
> 3) fldiff base.txt latest.txt
> 
> I'm currently using option 3, but 1 or 2 might be better.

I think `fldiff base.txt latest.txt` is the best approach because it's similar
to the way `diff` is called.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
File abp/filters/diff_script.py (right):

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:33: default='-', nargs='?')
It seems that base and latest should be mandatory arguments (without defaults).

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:46: Parse = namedtuple('Test', 'base, latest,
outfile')
Are we going to use this code path? If yes, I would recommend to expect a list
of arguments and then pass it to `parser.parse_args()` (it takes a parameter
that defaults to `sys.args`) rather than emulating parsed arguments.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:53: lines = render_diff(base, latest)
On 2018/09/18 19:13:18, Sebastian Noack wrote:
> Iterating over a file-object yields each line including the line ending
> character(s). While for most lines (all except metadata) the lines are
stripped
> in parse_line(), this is an implementation detail of the parser. So we might
> want to pass lines without line endings here.

Actually metadata is also stripped of the line endings -- I was worried that it
wouldn't be after the latest changes, but the regex matching seems to take care
of it.

The parser is intended to work this way (see example in README and the code in
render_script.py), so I would say that the following pattern is idiomatic:

    with open(file) as f:
        for parsed_line in parse_filterlist(f):
            # do something with the parsed line

This is a common enough use case that we rather would like this to work instead
of requiring the caller to always strip the lines by themselves.

Vasily Kuznetsov

On 2018/09/18 19:13:19, Sebastian Noack wrote: > I also noticed that in render_script.py we don't ...

Sept. 19, 2018, 6:42 p.m. (2018-09-19 18:42:04 UTC) #7

Sebastian Noack

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_script.py File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_script.py#newcode53 abp/filters/diff_script.py:53: lines = render_diff(base, latest) On 2018/09/19 18:40:04, Vasily Kuznetsov ...

Sept. 20, 2018, 4:59 p.m. (2018-09-20 16:59:12 UTC) #8

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
File abp/filters/diff_script.py (right):

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:53: lines = render_diff(base, latest)
On 2018/09/19 18:40:04, Vasily Kuznetsov wrote:
> Actually metadata is also stripped of the line endings -- I was worried that
it
> wouldn't be after the latest changes, but the regex matching seems to take
care
> of it.

Ah, right, that is because by default . doesn't match newline characters.

> The parser is intended to work this way (see example in README and the code in
> render_script.py), so I would say that the following pattern is idiomatic:
> 
>     with open(file) as f:
>         for parsed_line in parse_filterlist(f):
>             # do something with the parsed line
> 
> This is a common enough use case that we rather would like this to work
instead
> of requiring the caller to always strip the lines by themselves.

Alright.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:56: sys.stdout.write(line + '\n')
On 2018/09/18 19:13:18, Sebastian Noack wrote:
> Does that work on Python 2 if there are any non-ascii characters?

For reference, the problem is that in Python 2 if no shell (or no unicode-aware
shell) is attached to stdout, sys.stdout.encoding will be None (which means
ascii), while Python 3 defaults to UTF-8 in that case.

I think the simplest way, with the least duplication and special cases, giving
consitent behavior on Python 2 and 3 would be this:

  if args.outfile == '-':
      outfile = io.open(sys.stdout.fileno(), 'w',
                        closefd=False,
                        encoding=sys.stdout.encoding or 'utf-8')
  else:
      outfile = io.open(args.outfile, 'w', encoding='utf-8')

  with outfile:
      for line in lines:
          print(line, file=outfile)

Vasily Kuznetsov

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_script.py File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_script.py#newcode56 abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') On 2018/09/20 16:59:12, Sebastian Noack wrote: ...

Sept. 21, 2018, 7:48 a.m. (2018-09-21 07:48:42 UTC) #9

rhowell

Hey, thanks for the feedback, it was very helpful. Let me know if you see ...

Sept. 21, 2018, 8:37 a.m. (2018-09-21 08:37:09 UTC) #10

Hey, thanks for the feedback, it was very helpful. Let me know if you see any
other issues?

https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr...
File tests/test_diff_script.py (right):

https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr...
tests/test_diff_script.py:70: run_script(str(rootdir.join('base.txt')),
On 2018/09/19 18:40:03, Vasily Kuznetsov wrote:
> On 2018/09/18 17:23:33, rhowell wrote:
> > Should the script be called like:
> > 1) fldiff base=base.txt latest=latest.txt
> > 2) fldiff -b base.txt -l latest.txt
> > 3) fldiff base.txt latest.txt
> > 
> > I'm currently using option 3, but 1 or 2 might be better.
> 
> I think `fldiff base.txt latest.txt` is the best approach because it's similar
> to the way `diff` is called.

Acknowledged.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
File abp/filters/diff_script.py (right):

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:33: default='-', nargs='?')
On 2018/09/19 18:40:04, Vasily Kuznetsov wrote:
> It seems that base and latest should be mandatory arguments (without
defaults).

Done.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:46: Parse = namedtuple('Test', 'base, latest,
outfile')
On 2018/09/19 18:40:04, Vasily Kuznetsov wrote:
> Are we going to use this code path? If yes, I would recommend to expect a list
> of arguments and then pass it to `parser.parse_args()` (it takes a parameter
> that defaults to `sys.args`) rather than emulating parsed arguments.

Good point. This was helpful for me while testing, but I don't know any case
where it's useful in the real world. I'll remove it.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:53: lines = render_diff(base, latest)
On 2018/09/20 16:59:12, Sebastian Noack wrote:
> On 2018/09/19 18:40:04, Vasily Kuznetsov wrote:
> > Actually metadata is also stripped of the line endings -- I was worried that
> it
> > wouldn't be after the latest changes, but the regex matching seems to take
> care
> > of it.
> 
> Ah, right, that is because by default . doesn't match newline characters.
> 
> > The parser is intended to work this way (see example in README and the code
in
> > render_script.py), so I would say that the following pattern is idiomatic:
> > 
> >     with open(file) as f:
> >         for parsed_line in parse_filterlist(f):
> >             # do something with the parsed line
> > 
> > This is a common enough use case that we rather would like this to work
> instead
> > of requiring the caller to always strip the lines by themselves.
> 
> Alright.

Acknowledged.

https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc...
abp/filters/diff_script.py:56: sys.stdout.write(line + '\n')
On 2018/09/21 07:48:42, Vasily Kuznetsov wrote:
> On 2018/09/20 16:59:12, Sebastian Noack wrote:
> > On 2018/09/18 19:13:18, Sebastian Noack wrote:
> > > Does that work on Python 2 if there are any non-ascii characters?
> > 
> > For reference, the problem is that in Python 2 if no shell (or no
> unicode-aware
> > shell) is attached to stdout, sys.stdout.encoding will be None (which means
> > ascii), while Python 3 defaults to UTF-8 in that case.
> > 
> > I think the simplest way, with the least duplication and special cases,
giving
> > consitent behavior on Python 2 and 3 would be this:
> > 
> >   if args.outfile == '-':
> >       outfile = io.open(sys.stdout.fileno(), 'w',
> >                         closefd=False,
> >                         encoding=sys.stdout.encoding or 'utf-8')
> >   else:
> >       outfile = io.open(args.outfile, 'w', encoding='utf-8')
> > 
> >   with outfile:
> >       for line in lines:
> >           print(line, file=outfile)
> 
> This is more or less the approach that we took in TrustedNews tools for a
> similar problem (there we wanted the file to be open in binary mode regardless
> of whether it's a real file or stdout). I also think this is a good way to go.

This approach seems to be working well. Thanks, I was having some trouble with
encoding.

Sebastian Noack

https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_script.py File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_script.py#newcode19 abp/filters/diff_script.py:19: import argparse Nit: __future__ imports are special, so I ...

Sept. 21, 2018, 11:51 a.m. (2018-09-21 11:51:21 UTC) #11

rhowell

Hey, thanks for the feedback! I've addressed all the comments, and added a REAME. Let ...

Sept. 24, 2018, 10:05 p.m. (2018-09-24 22:05:22 UTC) #12

Sebastian Noack

https://codereview.adblockplus.org/29884571/diff/29890625/README.md File README.md (right): https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode91 README.md:91: A diff allows a client running adblocking software such ...

Sept. 24, 2018, 10:21 p.m. (2018-09-24 22:21:09 UTC) #13

rhowell

Thanks for the feedback! Let me know if you see any other issues. https://codereview.adblockplus.org/29884571/diff/29890625/README.md File ...

Sept. 25, 2018, 1:53 a.m. (2018-09-25 01:53:45 UTC) #14

Sebastian Noack

https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_script.py File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_script.py#newcode33 abp/filters/diff_script.py:33: nargs='?') On 2018/09/25 01:53:45, rhowell wrote: > On 2018/09/24 ...

Sept. 25, 2018, 5 p.m. (2018-09-25 17:00:39 UTC) #15

rhowell

https://codereview.adblockplus.org/29884571/diff/29890631/README.md File README.md (right): https://codereview.adblockplus.org/29884571/diff/29890631/README.md#newcode111 README.md:111: The diff is generated such that the removed filters ...

Sept. 25, 2018, 11:01 p.m. (2018-09-25 23:01:46 UTC) #16

Sebastian Noack

LGTM https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_script.py File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_script.py#newcode56 abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') On 2018/09/20 16:59:12, Sebastian Noack ...

Sept. 26, 2018, 11:55 a.m. (2018-09-26 11:55:47 UTC) #17

Vasily Kuznetsov

Hi Rosie, Everything looks pretty good to me. There's just one nit about the docstring ...

Sept. 26, 2018, 4:50 p.m. (2018-09-26 16:50:41 UTC) #18

rhowell

> > I just noticed, we might have the same issue in render_script. This should ...

Sept. 26, 2018, 6:22 p.m. (2018-09-26 18:22:57 UTC) #19

Vasily Kuznetsov

Sept. 27, 2018, 10:07 a.m. (2018-09-27 10:07:44 UTC) #20

LGTM

On 2018/09/26 18:22:57, rhowell wrote:
> > > I just noticed, we might have the same issue in render_script. This should
> > > probably be addressed in a separate change.
> 
> > Yeah, makes sense. Perhaps we could move this piece of code out and import
> into
> > both scripts.
> 
> I can do this, if you want. Should I make an issue? Or Noissue?

I think a noissue would do. We could extract the code from fldiff into a method
that would be then called like this:

    with open_outfile(args.outfile) as outfile:
        print(..., file=outfile)

and then use it in both scripts. Seems like the description of this change would
fit into a commit message.

Expand All Messages | Collapse All Messages