|
|
Created:
Sept. 18, 2018, 3:57 p.m. by rhowell Modified:
Sept. 28, 2018, 1:13 a.m. Base URL:
https://hg.adblockplus.org/python-abp/ Visibility:
Public. |
DescriptionIssue 6945 - Add script to make filter list diffs
Patch Set 1 #
Total comments: 11
Patch Set 2 : Use io.open instead of open() #
Total comments: 14
Patch Set 3 : Handle non-ascii, remove extraneous test #
Total comments: 12
Patch Set 4 : Address comments on PS3, add README #
Total comments: 11
Patch Set 5 : Address comments on PS4 #
Total comments: 4
Patch Set 6 : Address comments on PS4 #
Total comments: 2
Patch Set 7 : Fix docstring #
MessagesTotal messages: 20
Hey Vasily, There are a couple spots that can likely be improved, but I wanted to get your feedback so far, and I had a couple questions (see below). Thanks! :) Rosie https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... abp/filters/diff_script.py:60: open(args.outfile, 'w') This feels hacky.. https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... File tests/test_diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... tests/test_diff_script.py:86: assert 'No such file or directory' in err I'm not sure if these errors are being handled properly.
https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... abp/filters/diff_script.py:52: latest = open(args.latest, 'r') Please make sure all files are explicitly closed. https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... abp/filters/diff_script.py:52: latest = open(args.latest, 'r') Any reason you don't use io.open(..., encoding="utf-8") here but below in order to write the output? I guess, it works because parse_line() implicitly decodes the input if bytes are provided, but we might want to avoid dealing with binary data as much as possible. Another advantage of io.open() is that it behaves the same on Python 2 and 3, while the semantics of open() changed in Python 3 (to the semantics of io.open). https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... abp/filters/diff_script.py:60: open(args.outfile, 'w') On 2018/09/18 16:01:54, rhowell wrote: > This feels hacky.. Well, this is redundant. Why did you put it here?
Thanks for the quick feedback! https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... abp/filters/diff_script.py:52: latest = open(args.latest, 'r') On 2018/09/18 16:56:12, Sebastian Noack wrote: > Please make sure all files are explicitly closed. Done. https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... abp/filters/diff_script.py:60: open(args.outfile, 'w') On 2018/09/18 16:56:12, Sebastian Noack wrote: > On 2018/09/18 16:01:54, rhowell wrote: > > This feels hacky.. > > Well, this is redundant. Why did you put it here? Wow, you're right. I swore I was getting an error whenever I tried to write to this file, that the file didn't exist. I read through the docs for io (https://docs.python.org/2/library/io.html), and didn't see an option to create the file if it doesn't exist (Like 'w+' when using `open`). Fixed. https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... File tests/test_diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... tests/test_diff_script.py:70: run_script(str(rootdir.join('base.txt')), Should the script be called like: 1) fldiff base=base.txt latest=latest.txt 2) fldiff -b base.txt -l latest.txt 3) fldiff base.txt latest.txt I'm currently using option 3, but 1 or 2 might be better.
I also noticed that in render_script.py we don't read files directly from disk but use some abstraction layer (see sources.py). However, I am not familiar with the purpose of this abstraction, and whether it is relevant when generating diffs. https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/abp/filters/diff_sc... abp/filters/diff_script.py:60: open(args.outfile, 'w') On 2018/09/18 17:23:33, rhowell wrote: > On 2018/09/18 16:56:12, Sebastian Noack wrote: > > On 2018/09/18 16:01:54, rhowell wrote: > > > This feels hacky.. > > > > Well, this is redundant. Why did you put it here? > > Wow, you're right. I swore I was getting an error whenever I tried to write to > this file, that the file didn't exist. I read through the docs for io > (https://docs.python.org/2/library/io.html), and didn't see an option to create > the file if it doesn't exist (Like 'w+' when using `open`). Fixed. Both, open(..., 'w') and io.open(..., 'w') create the file if it doesn't exist. However, 'w+' (read+write) doesn't, but there is no need to use this mode here. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:53: lines = render_diff(base, latest) Iterating over a file-object yields each line including the line ending character(s). While for most lines (all except metadata) the lines are stripped in parse_line(), this is an implementation detail of the parser. So we might want to pass lines without line endings here. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') Does that work on Python 2 if there are any non-ascii characters?
Hi Rosie, I would say that the overall approach looks right to me. I have answered your questions and added a couple of other comments. Looking forward to seeing this in a more completed form but also feel free to ask further questions. Cheers, Vasily https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... File tests/test_diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... tests/test_diff_script.py:70: run_script(str(rootdir.join('base.txt')), On 2018/09/18 17:23:33, rhowell wrote: > Should the script be called like: > 1) fldiff base=base.txt latest=latest.txt > 2) fldiff -b base.txt -l latest.txt > 3) fldiff base.txt latest.txt > > I'm currently using option 3, but 1 or 2 might be better. I think `fldiff base.txt latest.txt` is the best approach because it's similar to the way `diff` is called. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:33: default='-', nargs='?') It seems that base and latest should be mandatory arguments (without defaults). https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:46: Parse = namedtuple('Test', 'base, latest, outfile') Are we going to use this code path? If yes, I would recommend to expect a list of arguments and then pass it to `parser.parse_args()` (it takes a parameter that defaults to `sys.args`) rather than emulating parsed arguments. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:53: lines = render_diff(base, latest) On 2018/09/18 19:13:18, Sebastian Noack wrote: > Iterating over a file-object yields each line including the line ending > character(s). While for most lines (all except metadata) the lines are stripped > in parse_line(), this is an implementation detail of the parser. So we might > want to pass lines without line endings here. Actually metadata is also stripped of the line endings -- I was worried that it wouldn't be after the latest changes, but the regex matching seems to take care of it. The parser is intended to work this way (see example in README and the code in render_script.py), so I would say that the following pattern is idiomatic: with open(file) as f: for parsed_line in parse_filterlist(f): # do something with the parsed line This is a common enough use case that we rather would like this to work instead of requiring the caller to always strip the lines by themselves.
On 2018/09/18 19:13:19, Sebastian Noack wrote: > I also noticed that in render_script.py we don't read files directly from disk > but use some abstraction layer (see sources.py). However, I am not familiar with > the purpose of this abstraction, and whether it is relevant when generating > diffs. The sources layer is needed for handling includes. It seems to be irrelevant for diff generation because we work with two local files here and don't process includes. I think reading directly from disk is the right way to go.
https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:53: lines = render_diff(base, latest) On 2018/09/19 18:40:04, Vasily Kuznetsov wrote: > Actually metadata is also stripped of the line endings -- I was worried that it > wouldn't be after the latest changes, but the regex matching seems to take care > of it. Ah, right, that is because by default . doesn't match newline characters. > The parser is intended to work this way (see example in README and the code in > render_script.py), so I would say that the following pattern is idiomatic: > > with open(file) as f: > for parsed_line in parse_filterlist(f): > # do something with the parsed line > > This is a common enough use case that we rather would like this to work instead > of requiring the caller to always strip the lines by themselves. Alright. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') On 2018/09/18 19:13:18, Sebastian Noack wrote: > Does that work on Python 2 if there are any non-ascii characters? For reference, the problem is that in Python 2 if no shell (or no unicode-aware shell) is attached to stdout, sys.stdout.encoding will be None (which means ascii), while Python 3 defaults to UTF-8 in that case. I think the simplest way, with the least duplication and special cases, giving consitent behavior on Python 2 and 3 would be this: if args.outfile == '-': outfile = io.open(sys.stdout.fileno(), 'w', closefd=False, encoding=sys.stdout.encoding or 'utf-8') else: outfile = io.open(args.outfile, 'w', encoding='utf-8') with outfile: for line in lines: print(line, file=outfile)
https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') On 2018/09/20 16:59:12, Sebastian Noack wrote: > On 2018/09/18 19:13:18, Sebastian Noack wrote: > > Does that work on Python 2 if there are any non-ascii characters? > > For reference, the problem is that in Python 2 if no shell (or no unicode-aware > shell) is attached to stdout, sys.stdout.encoding will be None (which means > ascii), while Python 3 defaults to UTF-8 in that case. > > I think the simplest way, with the least duplication and special cases, giving > consitent behavior on Python 2 and 3 would be this: > > if args.outfile == '-': > outfile = io.open(sys.stdout.fileno(), 'w', > closefd=False, > encoding=sys.stdout.encoding or 'utf-8') > else: > outfile = io.open(args.outfile, 'w', encoding='utf-8') > > with outfile: > for line in lines: > print(line, file=outfile) This is more or less the approach that we took in TrustedNews tools for a similar problem (there we wanted the file to be open in binary mode regardless of whether it's a real file or stdout). I also think this is a good way to go.
Hey, thanks for the feedback, it was very helpful. Let me know if you see any other issues? https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... File tests/test_diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884572/tests/test_diff_scr... tests/test_diff_script.py:70: run_script(str(rootdir.join('base.txt')), On 2018/09/19 18:40:03, Vasily Kuznetsov wrote: > On 2018/09/18 17:23:33, rhowell wrote: > > Should the script be called like: > > 1) fldiff base=base.txt latest=latest.txt > > 2) fldiff -b base.txt -l latest.txt > > 3) fldiff base.txt latest.txt > > > > I'm currently using option 3, but 1 or 2 might be better. > > I think `fldiff base.txt latest.txt` is the best approach because it's similar > to the way `diff` is called. Acknowledged. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:33: default='-', nargs='?') On 2018/09/19 18:40:04, Vasily Kuznetsov wrote: > It seems that base and latest should be mandatory arguments (without defaults). Done. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:46: Parse = namedtuple('Test', 'base, latest, outfile') On 2018/09/19 18:40:04, Vasily Kuznetsov wrote: > Are we going to use this code path? If yes, I would recommend to expect a list > of arguments and then pass it to `parser.parse_args()` (it takes a parameter > that defaults to `sys.args`) rather than emulating parsed arguments. Good point. This was helpful for me while testing, but I don't know any case where it's useful in the real world. I'll remove it. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:53: lines = render_diff(base, latest) On 2018/09/20 16:59:12, Sebastian Noack wrote: > On 2018/09/19 18:40:04, Vasily Kuznetsov wrote: > > Actually metadata is also stripped of the line endings -- I was worried that > it > > wouldn't be after the latest changes, but the regex matching seems to take > care > > of it. > > Ah, right, that is because by default . doesn't match newline characters. > > > The parser is intended to work this way (see example in README and the code in > > render_script.py), so I would say that the following pattern is idiomatic: > > > > with open(file) as f: > > for parsed_line in parse_filterlist(f): > > # do something with the parsed line > > > > This is a common enough use case that we rather would like this to work > instead > > of requiring the caller to always strip the lines by themselves. > > Alright. Acknowledged. https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') On 2018/09/21 07:48:42, Vasily Kuznetsov wrote: > On 2018/09/20 16:59:12, Sebastian Noack wrote: > > On 2018/09/18 19:13:18, Sebastian Noack wrote: > > > Does that work on Python 2 if there are any non-ascii characters? > > > > For reference, the problem is that in Python 2 if no shell (or no > unicode-aware > > shell) is attached to stdout, sys.stdout.encoding will be None (which means > > ascii), while Python 3 defaults to UTF-8 in that case. > > > > I think the simplest way, with the least duplication and special cases, giving > > consitent behavior on Python 2 and 3 would be this: > > > > if args.outfile == '-': > > outfile = io.open(sys.stdout.fileno(), 'w', > > closefd=False, > > encoding=sys.stdout.encoding or 'utf-8') > > else: > > outfile = io.open(args.outfile, 'w', encoding='utf-8') > > > > with outfile: > > for line in lines: > > print(line, file=outfile) > > This is more or less the approach that we took in TrustedNews tools for a > similar problem (there we wanted the file to be open in binary mode regardless > of whether it's a real file or stdout). I also think this is a good way to go. This approach seems to be working well. Thanks, I was having some trouble with encoding.
https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:19: import argparse Nit: __future__ imports are special, so I would seperate them from corelib imports with a blank line. https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:25: __all__ = ['main'] Nit: There should be blank link above, here as well. https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:42: def main(args_in=None): The args_in argument seems to be unused, and I don't see such an argument in render_script.py either. https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:46: with io.open(args.base, 'r') as base, io.open(args.latest, 'r') as latest: Unlike below, you don't specify an encoding when calling io.open() here. On Linux (at least) it will still work as intended as it defaults to UTF-8. But according to the documentation the default encoding depends on the platform. So we should explicitly specify encoding='utf-8' here as well. https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:59: except NotFound as exc: Can we even run into this exception since we neither load the input files through the sources module nor do we render includes. https://codereview.adblockplus.org/29884571/diff/29887555/tests/test_differ.py File tests/test_differ.py (right): https://codereview.adblockplus.org/29884571/diff/29887555/tests/test_differ.p... tests/test_differ.py:17: from __future__ import unicode_literals Nit: It seems in all other files we only have one blank line in between the license disclaimer and the first import.
Hey, thanks for the feedback! I've addressed all the comments, and added a REAME. Let me know if I've missed anything? https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:19: import argparse On 2018/09/21 11:51:21, Sebastian Noack wrote: > Nit: __future__ imports are special, so I would seperate them from corelib > imports with a blank line. Done. https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:25: __all__ = ['main'] On 2018/09/21 11:51:21, Sebastian Noack wrote: > Nit: There should be blank link above, here as well. Done. https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:42: def main(args_in=None): On 2018/09/21 11:51:20, Sebastian Noack wrote: > The args_in argument seems to be unused, and I don't see such an argument in > render_script.py either. Oops, I should've removed this when removing the extraneous test on the last patch set. Done! https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:46: with io.open(args.base, 'r') as base, io.open(args.latest, 'r') as latest: On 2018/09/21 11:51:21, Sebastian Noack wrote: > Unlike below, you don't specify an encoding when calling io.open() here. On > Linux (at least) it will still work as intended as it defaults to UTF-8. But > according to the documentation the default encoding depends on the platform. So > we should explicitly specify encoding='utf-8' here as well. Done. https://codereview.adblockplus.org/29884571/diff/29887555/abp/filters/diff_sc... abp/filters/diff_script.py:59: except NotFound as exc: On 2018/09/21 11:51:20, Sebastian Noack wrote: > Can we even run into this exception since we neither load the input files > through the sources module nor do we render includes. I guess not. Removed it. https://codereview.adblockplus.org/29884571/diff/29887555/tests/test_differ.py File tests/test_differ.py (right): https://codereview.adblockplus.org/29884571/diff/29887555/tests/test_differ.p... tests/test_differ.py:17: from __future__ import unicode_literals On 2018/09/21 11:51:21, Sebastian Noack wrote: > Nit: It seems in all other files we only have one blank line in between the > license disclaimer and the first import. Done.
https://codereview.adblockplus.org/29884571/diff/29890625/README.md File README.md (right): https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode91 README.md:91: A diff allows a client running adblocking software such as Adblock Plus to update "ad blocking" are two words. https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode97 README.md:97: Python-abp contains a script that produces the diff between two filterlists called ... between two versions of a filter list ... https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode106 README.md:106: The script produces three types of lines, as specified in the [technical specification][5]: This URL is not defined at the bottom. https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode112 README.md:112: comments, remove any filters that have been removed, and then add any filters That is not accurate. As of the current specification, the client will remove/add filters in the order the lines occur in the diff. https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_sc... abp/filters/diff_script.py:33: nargs='?') Didn't we want to make those arguments mandatory?
Thanks for the feedback! Let me know if you see any other issues. https://codereview.adblockplus.org/29884571/diff/29890625/README.md File README.md (right): https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode91 README.md:91: A diff allows a client running adblocking software such as Adblock Plus to update On 2018/09/24 22:21:09, Sebastian Noack wrote: > "ad blocking" are two words. Done. https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode97 README.md:97: Python-abp contains a script that produces the diff between two filterlists called On 2018/09/24 22:21:09, Sebastian Noack wrote: > ... between two versions of a filter list ... Done. https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode106 README.md:106: The script produces three types of lines, as specified in the [technical specification][5]: On 2018/09/24 22:21:09, Sebastian Noack wrote: > This URL is not defined at the bottom. Oops! I had it in a markdown editor, but forgot to copy that line over. Done. https://codereview.adblockplus.org/29884571/diff/29890625/README.md#newcode112 README.md:112: comments, remove any filters that have been removed, and then add any filters On 2018/09/24 22:21:09, Sebastian Noack wrote: > That is not accurate. As of the current specification, the client will > remove/add filters in the order the lines occur in the diff. Ah right. I missed that change to the spec. Fixed. https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_sc... abp/filters/diff_script.py:33: nargs='?') On 2018/09/24 22:21:09, Sebastian Noack wrote: > Didn't we want to make those arguments mandatory? Positional arguments are already mandatory (or it will raise a TypeError). Vasily's suggestion was to remove the `default='-'` arguments. I did that in patch set 3.
https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29890625/abp/filters/diff_sc... abp/filters/diff_script.py:33: nargs='?') On 2018/09/25 01:53:45, rhowell wrote: > On 2018/09/24 22:21:09, Sebastian Noack wrote: > > Didn't we want to make those arguments mandatory? > > Positional arguments are already mandatory (or it will raise a TypeError). > Vasily's suggestion was to remove the `default='-'` arguments. I did that in > patch set 3. I see, the documentation seems a little misleading: One argument will be consumed from the command line if possible, and produced as a single item. If no command-line argument is present, the value from default will be produced. It doesn't specify what happens if no default is specified. I wonder what would happen if you omit the nargs argument altogether? https://codereview.adblockplus.org/29884571/diff/29890631/README.md File README.md (right): https://codereview.adblockplus.org/29884571/diff/29890631/README.md#newcode111 README.md:111: The diff is generated such that the removed filters are listed before the added Maybe this is a bit too much detail, if we refer to the spec anyway. But I don't care too much, so I leave it up to you (and Vasily), however, keep in mind if the behavior should ever change this will be an additional resource that you have to remember to update. https://codereview.adblockplus.org/29884571/diff/29890631/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29890631/abp/filters/diff_sc... abp/filters/diff_script.py:44: """Entry point for the diff rendering script (diffrender).""" diffrender? Isn't it fldiff?
https://codereview.adblockplus.org/29884571/diff/29890631/README.md File README.md (right): https://codereview.adblockplus.org/29884571/diff/29890631/README.md#newcode111 README.md:111: The diff is generated such that the removed filters are listed before the added On 2018/09/25 17:00:39, Sebastian Noack wrote: > Maybe this is a bit too much detail, if we refer to the spec anyway. But I don't > care too much, so I leave it up to you (and Vasily), however, keep in mind if > the behavior should ever change this will be an additional resource that you > have to remember to update. Yeah, I see your point. Maybe it would be better to mention this where it's done in the source code, in renderer.py. I moved it there. https://codereview.adblockplus.org/29884571/diff/29890631/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29890631/abp/filters/diff_sc... abp/filters/diff_script.py:44: """Entry point for the diff rendering script (diffrender).""" On 2018/09/25 17:00:39, Sebastian Noack wrote: > diffrender? Isn't it fldiff? Good catch! I changed the name, but missed this line. Done.
LGTM https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') On 2018/09/20 16:59:12, Sebastian Noack wrote: > On 2018/09/18 19:13:18, Sebastian Noack wrote: > > Does that work on Python 2 if there are any non-ascii characters? > > For reference, the problem is that in Python 2 if no shell (or no unicode-aware > shell) is attached to stdout, sys.stdout.encoding will be None (which means > ascii), while Python 3 defaults to UTF-8 in that case. > > I think the simplest way, with the least duplication and special cases, giving > consitent behavior on Python 2 and 3 would be this: > > if args.outfile == '-': > outfile = io.open(sys.stdout.fileno(), 'w', > closefd=False, > encoding=sys.stdout.encoding or 'utf-8') > else: > outfile = io.open(args.outfile, 'w', encoding='utf-8') > > with outfile: > for line in lines: > print(line, file=outfile) I just noticed, we might have the same issue in render_script. This should probably be addressed in a separate change.
Hi Rosie, Everything looks pretty good to me. There's just one nit about the docstring in the test file. Cheers, Vasily https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... File abp/filters/diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29884576/abp/filters/diff_sc... abp/filters/diff_script.py:56: sys.stdout.write(line + '\n') On 2018/09/26 11:55:47, Sebastian Noack wrote: > On 2018/09/20 16:59:12, Sebastian Noack wrote: > > On 2018/09/18 19:13:18, Sebastian Noack wrote: > > > Does that work on Python 2 if there are any non-ascii characters? > > > > For reference, the problem is that in Python 2 if no shell (or no > unicode-aware > > shell) is attached to stdout, sys.stdout.encoding will be None (which means > > ascii), while Python 3 defaults to UTF-8 in that case. > > > > I think the simplest way, with the least duplication and special cases, giving > > consitent behavior on Python 2 and 3 would be this: > > > > if args.outfile == '-': > > outfile = io.open(sys.stdout.fileno(), 'w', > > closefd=False, > > encoding=sys.stdout.encoding or 'utf-8') > > else: > > outfile = io.open(args.outfile, 'w', encoding='utf-8') > > > > with outfile: > > for line in lines: > > print(line, file=outfile) > > I just noticed, we might have the same issue in render_script. This should > probably be addressed in a separate change. Yeah, makes sense. Perhaps we could move this piece of code out and import into both scripts. https://codereview.adblockplus.org/29884571/diff/29891686/tests/test_diff_scr... File tests/test_diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29891686/tests/test_diff_scr... tests/test_diff_script.py:16: """Functional tests for the rendering script. This docstring seems to not be quite accurate...
> > I just noticed, we might have the same issue in render_script. This should > > probably be addressed in a separate change. > Yeah, makes sense. Perhaps we could move this piece of code out and import into > both scripts. I can do this, if you want. Should I make an issue? Or Noissue? https://codereview.adblockplus.org/29884571/diff/29891686/tests/test_diff_scr... File tests/test_diff_script.py (right): https://codereview.adblockplus.org/29884571/diff/29891686/tests/test_diff_scr... tests/test_diff_script.py:16: """Functional tests for the rendering script. On 2018/09/26 16:50:41, Vasily Kuznetsov wrote: > This docstring seems to not be quite accurate... Oops! Not sure how I missed that. Done!
LGTM On 2018/09/26 18:22:57, rhowell wrote: > > > I just noticed, we might have the same issue in render_script. This should > > > probably be addressed in a separate change. > > > Yeah, makes sense. Perhaps we could move this piece of code out and import > into > > both scripts. > > I can do this, if you want. Should I make an issue? Or Noissue? I think a noissue would do. We could extract the code from fldiff into a method that would be then called like this: with open_outfile(args.outfile) as outfile: print(..., file=outfile) and then use it in both scripts. Seems like the description of this change would fit into a commit message. |