|
|
Created:
June 8, 2017, 1:39 p.m. by rosie Modified:
July 17, 2017, 3:49 p.m. Base URL:
https://hg.adblockplus.org/codingtools Visibility:
Public. |
DescriptionThis script will attempt to update the eyeo copyright information on each page of each repo listed on https://hg.adblockplus.org/
Patch Set 1 #
Total comments: 63
Patch Set 2 : Addressed comments #
Total comments: 43
Patch Set 3 : Addressed more comments #
Total comments: 19
Patch Set 4 : Clean up comments, ReadMe, formatting #
Total comments: 6
Patch Set 5 : Minor formatting fixes #
Total comments: 24
Patch Set 6 : Also test strings and non-eyeo copyright info #
Total comments: 5
Patch Set 7 : Remove the wrapper #
Total comments: 4
Patch Set 8 : Remove bash script, make Python script executable #Patch Set 9 : Fix indentation and default args #
MessagesTotal messages: 29
Hey Rosie, Thanks for the patch! It looks like the tests are not passing as of right now, so if the next patch set could be passing that would be really awesome. Overall good job as far as the approach goes, but there details which should be addressed. Cheers! Jon https://codereview.adblockplus.org/29459580/diff/29459581/.hgignore File .hgignore (right): https://codereview.adblockplus.org/29459580/diff/29459581/.hgignore#newcode9 .hgignore:9: .cache Also you should add .coverage to this ignore file https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/RE... update-copyright/README.md:11: The tests can be run via [Tox](http://tox.readthedocs.org/) I think you can pass the url to the webpage to be scraped into the script so maybe this should be updated to reflect that? https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... File update-copyright/tests/data/tst.patch (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/data/tst.patch:2: # User Rose Howell <r.howell@adblockplus.org> If you wanted to you could change the username here to a placeholder, but I am leaving it up to you https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:21: os.path.join(full_data_dir + '/repo_2/')] I don't think all these constants are required, such as 'local_dir'. Also, the url_list could probably just can just be defined in the test which uses it. I would also probably change the names use 'path' rather than 'dir' since in python a directory can be a separate object than a path, which is the string that leads to the dir or file. Although I dont have the strongest opinion on this but i was thinking it may make the code more clear. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:50: subprocess.check_call(['mkdir', repo_2]) you can avoid the use of temporary variables here by using the tmpdir fixture to it's full potential. You can create any number of temporary directories using the tmpdir object, so calling subprocess to create these repo's is not needed , just call 'tmpdir.mkdir(repo1)' and it will create a directory and return a local path object which you can also make directories in etc, also it makes more sense to just call the typecasting on the object when you need it rather than create temporary variables that redundantly hold strings which you already have access to. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:67: return str(temp_dir) returning the actual path object may be preferable here since it has a join method which will cut down using os.path.join, also even though youll have to use strpath when using subprocess youo have more control over the directory without calling subprocess as much. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:71: assert url_list == extract_urls(hg_page) here is where you would just hand code the url list. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:76: text_replace(temp_repo, filename) Have you tested that this is working? it seems that you are giving a path which is not expected here. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:92: subprocess.check_call(['touch', 'temp/foo']) you will want to use path.join for the temp/foo here https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:105: base_dir, 'hg_page.html')), base_dir) localhost isnt needed here, it is causing the test to fail. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/to... File update-copyright/tox.ini (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/to... update-copyright/tox.ini:2: envlist = py35 Can you please use py35 and 36? Similar to the tox.ini that is here: https://hg.adblockplus.org/codingtools/file/tip/flake8-eyeo/tox.ini https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/to... update-copyright/tox.ini:17: pytest --cov=update_copyright tests/ you can add consolidate the pytest calls into one command by adding the coverage command to the original pytest call 'pytest --cov=update_copyright tests'
Hi Rosie, Definitely on the right track but there are quite a few small things to fix. Cheers, Vasily https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:7: import pytest PEP8 suggests that the order of imports should be stdlib, third party, local. Pytest is third party, so it should be in a separate section by itself. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:10: from update_copyright.update_copyright import extract_urls According to PEP8, it would be ok to put all imports from update_copyright on the same line. Here it would also remove quite a bit of repetition, so it seems better to me. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:16: local_dir = os.path.dirname(os.path.abspath(__file__)) This is the path to the directory that contains the tests. If we also accept Jon's suggestion to use `_path` instead of `_dir` (I'm not too strong on this, but since below we also have urls, having `foo_bar_path` vs. `foo_bar_url` seems not too bad), we could call this `tests_path`. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:17: data_dir = os.path.join(local_dir, 'data') Then this would be `data_path`. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:18: full_data_dir = urllib.parse.urljoin('file://localhost', data_dir) This one is a URL of the data directory, so we can call it `data_url`. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:19: hg_page = os.path.join(full_data_dir, 'hg_page.html') This one would be `hg_page_url` then. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:26: # Returns a temporary repo containing one sample file This should be a docstring. Normally docstrings describe what a function/method/class does from a caller perspective, and comments document implementation details that are relevant for someone who would change the code that's being commented. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:30: subprocess.check_call(['cp', os.path.join(data_dir, 'sample_file.py'), It's better (faster) to use `shutil.copy` (see https://docs.python.org/3.5/library/shutil.html#shutil.copy) https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:32: subprocess.check_call(['hg', 'add', os.path.join(temp_repo, You can combine this with the next call of `hg` by giving it an `-A/--addremove` option. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:40: def base_dir(tmpdir): As we've discussed, it would be better to refactor this fixture and the one above to not duplicate the repository creation logic. The repository creation could be moved out into a function, something like `create_repo(path)` and then it would be called once in `temp_repo` fixture and twice in `base_dir` fixture. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:41: # Returns a temporary directory that contains one html page and two This should also be a docstring. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:80: except UnicodeDecodeError: I think you shouldn't catch this. The test should fail if we can't read the file for some reason. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:92: subprocess.check_call(['touch', 'temp/foo']) On 2017/06/13 16:30:19, Jon Sonesen wrote: > you will want to use path.join for the temp/foo here And it's probably better to use something like `open(path, 'w').close()` to create the file instead of invoking `touch`. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:100: subprocess.call(['rm', '-r', 'temp/']) # cleanup It would be better to use `shutil.rmtree` instead. But there's an even better idea: instead of creating temporary files around the filesystem, use `tmpdir` fixture and create things inside of it, then you don't need to worry about cleanup. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:116: same = set(file1).intersection(file2) This seems unnecessarily complicated. Perhaps you don't really need `tst.patch` and you can just check if the line 'Noissue - Updated copyright year\n' is in `repo_2.patch`. Would not it be the same level of checking as now? https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... File update-copyright/update_copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... update-copyright/update_copyright/update_copyright.py:1: #!/usr/bin/env python3 Maybe we should move this file to `update-copyright/update_copyright.py` instead and remove `update-copyright/update_copyright/__init__.py`? https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... update-copyright/update_copyright/update_copyright.py:20: if repo in { This should also not be hardcoded, but it's ok for now, let's land this first and then deal with these special cases. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... update-copyright/update_copyright/update_copyright.py:33: subprocess.check_call(['hg', 'up', '--rev', 'adblockbrowser', This should not be hardcoded, but it's ok for now.
Thanks for the feedback! It was very helpful. I think all comments have been addressed. Please let me know if I've missed anything! https://codereview.adblockplus.org/29459580/diff/29459581/.hgignore File .hgignore (right): https://codereview.adblockplus.org/29459580/diff/29459581/.hgignore#newcode9 .hgignore:9: .cache On 2017/06/13 16:30:19, Jon Sonesen wrote: > Also you should add .coverage to this ignore file Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/RE... update-copyright/README.md:11: The tests can be run via [Tox](http://tox.readthedocs.org/) On 2017/06/13 16:30:19, Jon Sonesen wrote: > I think you can pass the url to the webpage to be scraped into the script so > maybe this should be updated to reflect that? Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... File update-copyright/tests/data/tst.patch (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/data/tst.patch:2: # User Rose Howell <r.howell@adblockplus.org> On 2017/06/13 16:30:19, Jon Sonesen wrote: > If you wanted to you could change the username here to a placeholder, but I am > leaving it up to you Ended up not needing this file at all, but good to know for next time. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/data/tst.patch:2: # User Rose Howell <r.howell@adblockplus.org> On 2017/06/13 16:30:19, Jon Sonesen wrote: > If you wanted to you could change the username here to a placeholder, but I am > leaving it up to you Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:7: import pytest On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > PEP8 suggests that the order of imports should be stdlib, third party, local. > Pytest is third party, so it should be in a separate section by itself. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:10: from update_copyright.update_copyright import extract_urls On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > According to PEP8, it would be ok to put all imports from update_copyright on > the same line. Here it would also remove quite a bit of repetition, so it seems > better to me. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:16: local_dir = os.path.dirname(os.path.abspath(__file__)) On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > This is the path to the directory that contains the tests. If we also accept > Jon's suggestion to use `_path` instead of `_dir` (I'm not too strong on this, > but since below we also have urls, having `foo_bar_path` vs. `foo_bar_url` seems > not too bad), we could call this `tests_path`. Good idea. I would change it, but this variable ended up not being necessary. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:16: local_dir = os.path.dirname(os.path.abspath(__file__)) On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > This is the path to the directory that contains the tests. If we also accept > Jon's suggestion to use `_path` instead of `_dir` (I'm not too strong on this, > but since below we also have urls, having `foo_bar_path` vs. `foo_bar_url` seems > not too bad), we could call this `tests_path`. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:17: data_dir = os.path.join(local_dir, 'data') On 2017/06/19 16:25:29, Vasily Kuznetsov wrote: > Then this would be `data_path`. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:18: full_data_dir = urllib.parse.urljoin('file://localhost', data_dir) On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > This one is a URL of the data directory, so we can call it `data_url`. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:19: hg_page = os.path.join(full_data_dir, 'hg_page.html') On 2017/06/19 16:25:29, Vasily Kuznetsov wrote: > This one would be `hg_page_url` then. Ended up not needing this variable either https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:19: hg_page = os.path.join(full_data_dir, 'hg_page.html') On 2017/06/19 16:25:29, Vasily Kuznetsov wrote: > This one would be `hg_page_url` then. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:21: os.path.join(full_data_dir + '/repo_2/')] On 2017/06/13 16:30:20, Jon Sonesen wrote: > I don't think all these constants are required, such as 'local_dir'. Also, the > url_list could probably just can just be defined in the test which uses it. I > would also probably change the names use 'path' rather than 'dir' since in > python a directory can be a separate object than a path, which is the string > that leads to the dir or file. Although I dont have the strongest opinion on > this but i was thinking it may make the code more clear. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:26: # Returns a temporary repo containing one sample file On 2017/06/19 16:25:29, Vasily Kuznetsov wrote: > This should be a docstring. Normally docstrings describe what a > function/method/class does from a caller perspective, and comments document > implementation details that are relevant for someone who would change the code > that's being commented. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:30: subprocess.check_call(['cp', os.path.join(data_dir, 'sample_file.py'), On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > It's better (faster) to use `shutil.copy` (see > https://docs.python.org/3.5/library/shutil.html#shutil.copy) Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:32: subprocess.check_call(['hg', 'add', os.path.join(temp_repo, On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > You can combine this with the next call of `hg` by giving it an `-A/--addremove` > option. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:40: def base_dir(tmpdir): On 2017/06/19 16:25:29, Vasily Kuznetsov wrote: > As we've discussed, it would be better to refactor this fixture and the one > above to not duplicate the repository creation logic. The repository creation > could be moved out into a function, something like `create_repo(path)` and then > it would be called once in `temp_repo` fixture and twice in `base_dir` fixture. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:41: # Returns a temporary directory that contains one html page and two On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > This should also be a docstring. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:50: subprocess.check_call(['mkdir', repo_2]) On 2017/06/13 16:30:19, Jon Sonesen wrote: > you can avoid the use of temporary variables here by using the tmpdir fixture to > it's full potential. You can create any number of temporary directories using > the tmpdir object, so calling subprocess to create these repo's is not needed , > just call 'tmpdir.mkdir(repo1)' and it will create a directory and return a > local path object which you can also make directories in etc, also it makes more > sense to just call the typecasting on the object when you need it rather than > create temporary variables that redundantly hold strings which you already have > access to. Setting up the temporary directory base_dir like this makes it cleaner later when we call main() in test_all(). Then we just pass the path to the base directory, and the name hg_page.html into main(), and the script can find the hg page, both the repos to pull and modify, and also push the changes back to the base_dir's sub-repos. I think it's also closer to running the script on real repos if it's set up like this, and closer to being a full integration test (one repo has push access, the other does not): base_dir - hg_page.html - repo_1 - .hg - sample_file.py - repo_2 - .hg - sample_file.py https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:67: return str(temp_dir) On 2017/06/13 16:30:20, Jon Sonesen wrote: > returning the actual path object may be preferable here since it has a join > method which will cut down using os.path.join, also even though youll have to > use strpath when using subprocess youo have more control over the directory > without calling subprocess as much. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:71: assert url_list == extract_urls(hg_page) On 2017/06/13 16:30:19, Jon Sonesen wrote: > here is where you would just hand code the url list. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:76: text_replace(temp_repo, filename) On 2017/06/13 16:30:19, Jon Sonesen wrote: > Have you tested that this is working? it seems that you are giving a path which > is not expected here. In the temp_repo fixture, 'sample_file.py' is copied into the temp_repo directory from the /data folder. This test seems to be working for me. Is it giving you an error? https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:80: except UnicodeDecodeError: On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > I think you shouldn't catch this. The test should fail if we can't read the file > for some reason. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:92: subprocess.check_call(['touch', 'temp/foo']) On 2017/06/13 16:30:19, Jon Sonesen wrote: > you will want to use path.join for the temp/foo here Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:100: subprocess.call(['rm', '-r', 'temp/']) # cleanup On 2017/06/19 16:25:30, Vasily Kuznetsov wrote: > It would be better to use `shutil.rmtree` instead. But there's an even better > idea: instead of creating temporary files around the filesystem, use `tmpdir` > fixture and create things inside of it, then you don't need to worry about > cleanup. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:105: base_dir, 'hg_page.html')), base_dir) On 2017/06/13 16:30:20, Jon Sonesen wrote: > localhost isnt needed here, it is causing the test to fail. Looks like I should have used 'file:///' instead of 'file://localhost'. It seems to be working now on both Mint and Debian. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:105: base_dir, 'hg_page.html')), base_dir) On 2017/06/13 16:30:20, Jon Sonesen wrote: > localhost isnt needed here, it is causing the test to fail. Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/te... update-copyright/tests/test_update_copyright.py:116: same = set(file1).intersection(file2) On 2017/06/19 16:25:29, Vasily Kuznetsov wrote: > This seems unnecessarily complicated. Perhaps you don't really need `tst.patch` > and you can just check if the line 'Noissue - Updated copyright year\n' is in > `repo_2.patch`. Would not it be the same level of checking as now? Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/to... File update-copyright/tox.ini (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/to... update-copyright/tox.ini:2: envlist = py35 On 2017/06/13 16:30:20, Jon Sonesen wrote: > Can you please use py35 and 36? Similar to the tox.ini that is here: > https://hg.adblockplus.org/codingtools/file/tip/flake8-eyeo/tox.ini Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/to... update-copyright/tox.ini:17: pytest --cov=update_copyright tests/ On 2017/06/13 16:30:20, Jon Sonesen wrote: > you can add consolidate the pytest calls into one command by adding the coverage > command to the original pytest call > > 'pytest --cov=update_copyright tests' Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... File update-copyright/update_copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... update-copyright/update_copyright/update_copyright.py:1: #!/usr/bin/env python3 On 2017/06/19 16:25:31, Vasily Kuznetsov wrote: > Maybe we should move this file to `update-copyright/update_copyright.py` instead > and remove `update-copyright/update_copyright/__init__.py`? Done. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... update-copyright/update_copyright/update_copyright.py:20: if repo in { On 2017/06/19 16:25:31, Vasily Kuznetsov wrote: > This should also not be hardcoded, but it's ok for now, let's land this first > and then deal with these special cases. Acknowledged. https://codereview.adblockplus.org/29459580/diff/29459581/update-copyright/up... update-copyright/update_copyright/update_copyright.py:33: subprocess.check_call(['hg', 'up', '--rev', 'adblockbrowser', On 2017/06/19 16:25:31, Vasily Kuznetsov wrote: > This should not be hardcoded, but it's ok for now. Acknowledged.
Hi Rosie! Thanks for addressing the comments on the first patch set, it looks pretty good. I have quite a few comments again (see below) but they are all rather minor, so I'm hopeful about landing this soon ;) Cheers, Vasily https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/.c... File update-copyright/.cache/v/cache/lastfailed (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/.c... update-copyright/.cache/v/cache/lastfailed:1: {} This file should be ignored too. Probably the best approach is to ignore all `.cache` directories. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:49: repo_1 = temp_dir + '/repo_1' Any reason why you're not using `os.path.join` here? Perhaps that would be more portable and consistent. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:51: subprocess.check_call(['mkdir', repo_1]) You can also use `os.mkdir` for this -- simpler than calling an external process. Also, can this call be moved into `create_repo` function to avoid duplication here? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:57: subprocess.check_call(['touch', os.path.join(repo_2, '.hg/hgrc')]) Do we actually need to touch `hgrc` before we write into it? Wouldn't the next line just create the file anyway? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:66: urls = [os.path.join(data_url + '/repo_1/'), Seems like we're passing only one argument to `os.path.join` here. Wouldn't it do nothing in this case? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:85: subprocess.check_call(['hg', 'clone', temp_repo.strpath, directory]) You could also assign `temp_repo.strpath` to a variable, like you do with `directory` and reduce the duplication a bit. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/to... File update-copyright/tox.ini (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/to... update-copyright/tox.ini:2: #envlist = py{35,36} Forgot to uncomment? ;) https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:18: def process_repo(url, hg_upstream): Is `hg_upstream` unused in this function? It should probably be passed to `hg_commit` if I follow the logic correctly. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:19: repo = os.path.basename(os.path.normpath(url)) Why are we applying `os.path.normpath` to a url here? This might work sometimes, but in general that function is intended for filesystem paths, not urls. I wonder if this won't break in a funny way some day. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:37: subprocess.call(['hg', 'up', '--rev', 'master', Any idea why we're not doing `check_call` here just like in the other branch of `if`? Seems like if `hg` fails, we should also break. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:80: if 'ssh://hg@hg.adblockplus.org/' in hg_upstream: `hg_commit` is called with the original url of the repository on line 45 of `process_repo` as a second parameter, so it seems like `hg_upstream` should already be a complete URL in it and we don't need to add `repo` to it. Or am I missing something? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:94: dir_path = 'https://hg.adblockplus.org/' Is this used anywhere? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:112: if (self.cell == 2) and self.recordtr is True: The parentheses around the first operand of `and` are redundant so it would be more pythonic ;) to remove them. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:116: depr1 = re.search(r'\*DEPRECATED\*', data) I think it would make the code more readable and extensible if we do something like this instead: deprecated = (re.search(....) or re.search(....)) if not deprecated and ...: https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:124: dir_path = dirname(hg_page) + '/' Two points here: - `hg_page` is an url, could we use something other than `dirname` from `posixpath` (that's a bit of a misuse, I think)? - `dir_path` is essentially a base url, perhaps that would be a more telling name. What do you think? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:129: my_result = [] Maybe better call this `repo_urls`? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:130: for i in range(len(parser.result)): Why not `for url in parser.result:`? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:137: for repo in repo_list: Could probably just be `for repo in extract_urls(hg_page)`. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:147: default='ssh://hg@hg.adblockplus.org/', This should probably default to `None` and then if not specified, we should use the value of `--url`. Otherwise if we specify `--url` but no `--push`, the behavior will be rather unexpected. It would probably also be better to name the option `--push-url` to make it more clear.
Thanks for the helpful feedback! Please let me know if I've missed anything. Rosie https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/.c... File update-copyright/.cache/v/cache/lastfailed (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/.c... update-copyright/.cache/v/cache/lastfailed:1: {} On 2017/06/27 16:18:50, Vasily Kuznetsov wrote: > This file should be ignored too. Probably the best approach is to ignore all > `.cache` directories. Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:49: repo_1 = temp_dir + '/repo_1' On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > Any reason why you're not using `os.path.join` here? Perhaps that would be more > portable and consistent. repo_1 and repo_2 were only needed as strings, so this was a bit shorter, but you're right about switching to os.path.join being more portable and consistent. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:49: repo_1 = temp_dir + '/repo_1' On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > Any reason why you're not using `os.path.join` here? Perhaps that would be more > portable and consistent. Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:51: subprocess.check_call(['mkdir', repo_1]) On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > You can also use `os.mkdir` for this -- simpler than calling an external > process. Also, can this call be moved into `create_repo` function to avoid > duplication here? Switched it to `os.mkdir`. The `create_repo` function is used in both fixtures, once in `temp_repo`, and twice in `base_dir`. But the `temp_repo` function doesn't need both `repo_1` and `repo_2`, so I left that part in the fixture. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:51: subprocess.check_call(['mkdir', repo_1]) On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > You can also use `os.mkdir` for this -- simpler than calling an external > process. Also, can this call be moved into `create_repo` function to avoid > duplication here? Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:57: subprocess.check_call(['touch', os.path.join(repo_2, '.hg/hgrc')]) On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > Do we actually need to touch `hgrc` before we write into it? Wouldn't the next > line just create the file anyway? Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:66: urls = [os.path.join(data_url + '/repo_1/'), On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > Seems like we're passing only one argument to `os.path.join` here. Wouldn't it > do nothing in this case? Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:85: subprocess.check_call(['hg', 'clone', temp_repo.strpath, directory]) On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > You could also assign `temp_repo.strpath` to a variable, like you do with > `directory` and reduce the duplication a bit. Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/to... File update-copyright/tox.ini (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/to... update-copyright/tox.ini:2: #envlist = py{35,36} On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > Forgot to uncomment? ;) Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:18: def process_repo(url, hg_upstream): On 2017/06/27 16:18:54, Vasily Kuznetsov wrote: > Is `hg_upstream` unused in this function? It should probably be passed to > `hg_commit` if I follow the logic correctly. Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:19: repo = os.path.basename(os.path.normpath(url)) On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > Why are we applying `os.path.normpath` to a url here? This might work sometimes, > but in general that function is intended for filesystem paths, not urls. I > wonder if this won't break in a funny way some day. Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:37: subprocess.call(['hg', 'up', '--rev', 'master', On 2017/06/27 16:18:54, Vasily Kuznetsov wrote: > Any idea why we're not doing `check_call` here just like in the other branch of > `if`? Seems like if `hg` fails, we should also break. This line attempts to switch to the master branch. If a branch called `master` doesn't exist, then `hg up --rev master` returns a 255, but the script can keep running whether or not there's a branch called master. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:80: if 'ssh://hg@hg.adblockplus.org/' in hg_upstream: On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > `hg_commit` is called with the original url of the repository on line 45 of > `process_repo` as a second parameter, so it seems like `hg_upstream` should > already be a complete URL in it and we don't need to add `repo` to it. Or am I > missing something? If we're pushing to ssh://hg@hg.adblockplus.org/ servers, then we need to add the repo name on the end. Otherwise, it will push to the URL for each of the repos it finds on the hg_page, and we don't need to add the repo names to the end of those. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:94: dir_path = 'https://hg.adblockplus.org/' On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > Is this used anywhere? Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:112: if (self.cell == 2) and self.recordtr is True: On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > The parentheses around the first operand of `and` are redundant so it would be > more pythonic ;) to remove them. Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:116: depr1 = re.search(r'\*DEPRECATED\*', data) On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > I think it would make the code more readable and extensible if we do something > like this instead: > > deprecated = (re.search(....) or > re.search(....)) > if not deprecated and ...: > Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:124: dir_path = dirname(hg_page) + '/' On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > Two points here: > - `hg_page` is an url, could we use something other than `dirname` from > `posixpath` (that's a bit of a misuse, I think)? > - `dir_path` is essentially a base url, perhaps that would be a more telling > name. What do you think? Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:129: my_result = [] On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > Maybe better call this `repo_urls`? Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:130: for i in range(len(parser.result)): On 2017/06/27 16:18:53, Vasily Kuznetsov wrote: > Why not `for url in parser.result:`? Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:137: for repo in repo_list: On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > Could probably just be `for repo in extract_urls(hg_page)`. Done. https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:147: default='ssh://hg@hg.adblockplus.org/', On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > This should probably default to `None` and then if not specified, we should use > the value of `--url`. Otherwise if we specify `--url` but no `--push`, the > behavior will be rather unexpected. > > It would probably also be better to name the option `--push-url` to make it more > clear. Done.
Hi Rosie, It's looking good. A few more comments but nothing big that I see now. Cheers, Vasily https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/te... update-copyright/tests/test_update_copyright.py:51: subprocess.check_call(['mkdir', repo_1]) On 2017/07/03 15:33:44, rosie wrote: > On 2017/06/27 16:18:51, Vasily Kuznetsov wrote: > > You can also use `os.mkdir` for this -- simpler than calling an external > > process. Also, can this call be moved into `create_repo` function to avoid > > duplication here? > > Switched it to `os.mkdir`. The `create_repo` function is used in both fixtures, > once in `temp_repo`, and twice in `base_dir`. But the `temp_repo` function > doesn't need both `repo_1` and `repo_2`, so I left that part in the fixture. It looks like the same pattern is followed in the other fixture, first a directory is created and then `create_repo` is called on it. Creating the directory inside of `create_repo` instead would replace these three lines (creating "repo_1", "repo_2" and "tmp_repo") with just one line inside of the function. Or am I missing something? https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:37: subprocess.call(['hg', 'up', '--rev', 'master', On 2017/07/03 15:33:45, rosie wrote: > On 2017/06/27 16:18:54, Vasily Kuznetsov wrote: > > Any idea why we're not doing `check_call` here just like in the other branch > of > > `if`? Seems like if `hg` fails, we should also break. > > This line attempts to switch to the master branch. If a branch called `master` > doesn't exist, then `hg up --rev master` returns a 255, but the script can keep > running whether or not there's a branch called master. I see. Makes sense. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/bash Could probably be /bin/sh script, but this is minor. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/bash It would be good to mention this script in the README. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... File update-copyright/tests/data/tst.patch (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/data/tst.patch:1: # HG changeset patch This somehow got into the review again. I think you're not using it anymore, right? https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/test_update_copyright.py:18: print('19: Creating repo in: ', path) This looks like a debug print. It won't be visible because pytest suppresses prints but probably still better to remove it since it's a noop normally. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/test_update_copyright.py:50: repo_1 = str(os.path.join(temp_dir, 'repo_1')) Do you need the `str(...)` here? `os.path.join` returns a string anyway, doesn't it? https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/test_update_copyright.py:65: print('65: test_extract_urls') Also debug print ;) https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:84: if 'ssh://hg@hg.adblockplus.org/' in hg_upstream: Here we still have hardcoded logic related to hg.adblockplus.org. I think a better way could be to remove this `if` here and add an `else` branch to the `if` in lines 47-48 so that if `hg_upstream` is given by an option, we consider it a base url and append the repository name to it. Something like: if hg_upstream is None: hg_upstream = url else: hg_upstream += '/' + repo Seems like it would cover our use case and also work correctly with other possible values of the -p option. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:127: base_url = dirname(hg_page) + '/' This will probably still break on some non-POSIX systems (like Windows), but I guess it's ok, many other things wouldn't work either and we can probably live with this script being POSIX-only. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:152: if args.hg_url is None: Actually this is not necessary. If you make this option mandatory (check argparse documentation), argparse will signal errors for you. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:155: hg_page = args.hg_url Do you think these intermediate variables add value? I would probably remove them (and call `main` with `args.hg_url` and `args.push_url` directly), but I don't feel very strongly about it so whatever seems better to you is cool.
https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29472573/update-copyright/up... update-copyright/update_copyright.py:80: if 'ssh://hg@hg.adblockplus.org/' in hg_upstream: On 2017/06/27 16:18:52, Vasily Kuznetsov wrote: > `hg_commit` is called with the original url of the repository on line 45 of > `process_repo` as a second parameter, so it seems like `hg_upstream` should > already be a complete URL in it and we don't need to add `repo` to it. Or am I > missing something? If we're pushing to ssh://hg@hg.adblockplus.org/ servers, then we need to add the repo name on the end. Otherwise (not using SSH), it will push to the URL for each of the repos it finds on the hg_page, and we don't need to add the repo names to the end of those. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/bash On 2017/07/03 19:25:46, Vasily Kuznetsov wrote: > Could probably be /bin/sh script, but this is minor. Done. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... File update-copyright/tests/data/tst.patch (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/data/tst.patch:1: # HG changeset patch On 2017/07/03 19:25:46, Vasily Kuznetsov wrote: > This somehow got into the review again. I think you're not using it anymore, > right? Done. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... File update-copyright/tests/test_update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/test_update_copyright.py:18: print('19: Creating repo in: ', path) On 2017/07/03 19:25:46, Vasily Kuznetsov wrote: > This looks like a debug print. It won't be visible because pytest suppresses > prints but probably still better to remove it since it's a noop normally. Done. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/test_update_copyright.py:50: repo_1 = str(os.path.join(temp_dir, 'repo_1')) On 2017/07/03 19:25:46, Vasily Kuznetsov wrote: > Do you need the `str(...)` here? `os.path.join` returns a string anyway, doesn't > it? Done. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/te... update-copyright/tests/test_update_copyright.py:65: print('65: test_extract_urls') On 2017/07/03 19:25:46, Vasily Kuznetsov wrote: > Also debug print ;) Done. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:84: if 'ssh://hg@hg.adblockplus.org/' in hg_upstream: On 2017/07/03 19:25:47, Vasily Kuznetsov wrote: > Here we still have hardcoded logic related to http://hg.adblockplus.org. I think a > better way could be to remove this `if` here and add an `else` branch to the > `if` in lines 47-48 so that if `hg_upstream` is given by an option, we consider > it a base url and append the repository name to it. Something like: > > if hg_upstream is None: > hg_upstream = url > else: > hg_upstream += '/' + repo > > Seems like it would cover our use case and also work correctly with other > possible values of the -p option. Done. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:127: base_url = dirname(hg_page) + '/' On 2017/07/03 19:25:47, Vasily Kuznetsov wrote: > This will probably still break on some non-POSIX systems (like Windows), but I > guess it's ok, many other things wouldn't work either and we can probably live > with this script being POSIX-only. Acknowledged. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:152: if args.hg_url is None: On 2017/07/03 19:25:47, Vasily Kuznetsov wrote: > Actually this is not necessary. If you make this option mandatory (check > argparse documentation), argparse will signal errors for you. Done. https://codereview.adblockplus.org/29459580/diff/29478584/update-copyright/up... update-copyright/update_copyright.py:155: hg_page = args.hg_url On 2017/07/03 19:25:47, Vasily Kuznetsov wrote: > Do you think these intermediate variables add value? I would probably remove > them (and call `main` with `args.hg_url` and `args.push_url` directly), but I > don't feel very strongly about it so whatever seems better to you is cool. Done.
Hi Rosie, LGTM from my side, but feel free to fix a couple of stylistic nits that I flagged below. Jon, what do you think? Cheers, Vasily https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/RE... update-copyright/README.md:1: # update-copyright # In general we don't add # at ends of headings in the markdown documents. It's not a big deal to have them here, but I'd suggest to keep things consistent unless there's a good reason to not do it. https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/RE... update-copyright/README.md:3: This script accepts a URL to a mercurial index as input. It then makes a local copy of each repo, updates the copyright information on every file to the current year, then attempts to push the updates. If a user doesn't have permission to push to a repo, the script will make a local `repo-name.patch` file to submit later. You are free to use it for other projects but please keep in mind that we make no stability guarantees whatsoever and might change functionality any time. Another mostly consistency-related point: the long lines are usually wrapped to 80 characters elsewhere. There's no strong reason, but it's a bit easier to work with shorter lines and any decent editor can do the wrapping automatically. https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/up... update-copyright/update_copyright.py:144: default=None, I think you don't need a default here since `-u` is a required argument.
https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/RE... update-copyright/README.md:1: # update-copyright # On 2017/07/04 15:02:40, Vasily Kuznetsov wrote: > In general we don't add # at ends of headings in the markdown documents. It's > not a big deal to have them here, but I'd suggest to keep things consistent > unless there's a good reason to not do it. Done. https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/RE... update-copyright/README.md:3: This script accepts a URL to a mercurial index as input. It then makes a local copy of each repo, updates the copyright information on every file to the current year, then attempts to push the updates. If a user doesn't have permission to push to a repo, the script will make a local `repo-name.patch` file to submit later. You are free to use it for other projects but please keep in mind that we make no stability guarantees whatsoever and might change functionality any time. On 2017/07/04 15:02:40, Vasily Kuznetsov wrote: > Another mostly consistency-related point: the long lines are usually wrapped to > 80 characters elsewhere. There's no strong reason, but it's a bit easier to work > with shorter lines and any decent editor can do the wrapping automatically. Done. https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29479603/update-copyright/up... update-copyright/update_copyright.py:144: default=None, On 2017/07/04 15:02:40, Vasily Kuznetsov wrote: > I think you don't need a default here since `-u` is a required argument. Done.
Hey Guys, Rose, thanks a bunch for addressing these things, and Vasily thanks a lot for addressing the rest of the concerns :) my apologies for not being super active on this one after the first set of comments. That being said I am happy with what Vasily came up with and your implementation as well so LGTM Cheers :)
Patch set 5 also LGTM.
Thanks for working on this. You did a great job! For the most part, it looks great, but I'm afraid that I have some additional comments. ;) Also the title of this review (which I suppose is the same as the commit message) seems inaccurate. While adding test is mostly what you did here, this patch is also adding the original update_copyright.py code. Hence a better review title (and commit message) would be something like "Issue 5250 - Added copyright update script". https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh Is there any purpose of this wrapper script, other than providing defaults? Note that you could also provide defaults using argsparse in update_copyright.py: arg_parser.add_argument('-u', '--hg-url', default='https://hg.adblockplus.org/', ...) arg_parser.add_argument('-p', '--push-url', default='ssh://hg@hg.adblockplus.org/', ...) https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/te... File update-copyright/tests/data/sample_file.py (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/te... update-copyright/tests/data/sample_file.py:13: # Copyright (C) 2006-2014 eyeo GmbH Perhaps we should also test for copyright information in strings (not only in comments). This is actually a real use case which we have in adblockplusie. Also I think we should test that non-eyeo copyright information are left alone. There are more cases covered by the logic implemented, which we could address with the tests, but we have the above two cases in our actual code, and therefore it might be worth to cover them at least. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... update-copyright/update_copyright.py:1: #!/usr/bin/env python3 I suppose we should make this script executable (i.e. the mode above should be 755 instead of 644). Note that if the file is not executable, the shebang (i.e. #!) here would be useless. Once you made the script executable, you might also want to adapt the README to instruct calling ./update_copyright.py instead of "python3 update_copyright.py", which as a side effect removes some duplication. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... update-copyright/update_copyright.py:61: print("Error: Couldn't read {}{}".format(dirpath, filename)) Failing silently (in the original code) was intended here, as this script is expected to encounter many binary files (e.g. images). If we output an error message every time, this will generate a huge amount of noise. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... update-copyright/update_copyright.py:118: re.search(r'(Deprecated)', data)) This regular expression seems incorrect. The parentheses annotate a group. So it won't only match substrings like "(Deprecated)" but any occurrence of "Deprecated", storing the result in the 1st group of the match (which is discarded here). If you want to match parantheses in the input string, you have to escape them in the pattern: re.search(r'\(Deprecated\)', data) Anyway, if all you want to do is to find a fixed substring, it would be much simpler (and less error-prone) to just use the in-operator (instead of regular expressions): '*DEPRECATED*' in data or '(Deprecated)' in data
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/05 15:46:52, Sebastian Noack wrote: > Is there any purpose of this wrapper script, other than providing defaults? Note > that you could also provide defaults using argsparse in update_copyright.py: > > arg_parser.add_argument('-u', '--hg-url', > default='https://hg.adblockplus.org/', > ...) > arg_parser.add_argument('-p', '--push-url', > default='ssh://hg@hg.adblockplus.org/', > ...) I suggested to have a shell script instead of defaults in the python script. The logic of dealing with the defaults in python script is somewhat unobvious (for example, think about what would happen if we run `update_copyright.py -u https://hg.foo.bar/`). Also having the script try to push stuff into our production repositories when it's run without any arguments doesn't seems like the best idea. The shell script separates these things, makes the script behavior simple and obvious and prevents accidental runs targeted at production.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... update-copyright/README.md:8: If a user doesn't have permission to push to a repo, the script will make a local `repo-name.patch` file to submit later. We always wrap code before 80 characters. While technically, this isn't code here, the same reasoning applies, i.e. avoiding vertical scrolling and improving the reading flow when looking at the raw text. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/to... File update-copyright/tox.ini (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/to... update-copyright/tox.ini:11: flake8-putty You don't seem to actually use flake8-putty here. You probably have copied this over from the tox.ini in sitescripts, buildtools or cms, where we use flake8-putty in order to ignore specific flake8 errors for specific files, in order to deal with legacy code. This isn't the case here. However, you should add pep8-naming and flake8-eyeo instead, as we use these flake8 extensions everywhere. For example see this tox.ini: https://hg.adblockplus.org/codingtools/file/d95b8d1f3104/patchconv/tox.ini#l9
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/05 17:38:05, Vasily Kuznetsov wrote: > On 2017/07/05 15:46:52, Sebastian Noack wrote: > > Is there any purpose of this wrapper script, other than providing defaults? > Note > > that you could also provide defaults using argsparse in update_copyright.py: > > > > arg_parser.add_argument('-u', '--hg-url', > > default='https://hg.adblockplus.org/', > > ...) > > arg_parser.add_argument('-p', '--push-url', > > default='ssh://hg@hg.adblockplus.org/', > > ...) > > I suggested to have a shell script instead of defaults in the python script. The > logic of dealing with the defaults in python script is somewhat unobvious (for > example, think about what would happen if we run `update_copyright.py -u > https://hg.foo.bar/%60). Also having the script try to push stuff into our > production repositories when it's run without any arguments doesn't seems like > the best idea. > > The shell script separates these things, makes the script behavior simple and > obvious and prevents accidental runs targeted at production. Why do we support these arguments anyway? I guessed that these exist to be overridden by the tests, but apparently the tests import the script and call it's internal APIs directly. And given all the other eyeo-specific logic in the script, it doesn't seem to be too useful to have just these URL configurable.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/05 19:08:25, Sebastian Noack wrote: > On 2017/07/05 17:38:05, Vasily Kuznetsov wrote: > > On 2017/07/05 15:46:52, Sebastian Noack wrote: > > > Is there any purpose of this wrapper script, other than providing defaults? > > Note > > > that you could also provide defaults using argsparse in update_copyright.py: > > > > > > arg_parser.add_argument('-u', '--hg-url', > > > default='https://hg.adblockplus.org/', > > > ...) > > > arg_parser.add_argument('-p', '--push-url', > > > default='ssh://hg@hg.adblockplus.org/', > > > ...) > > > > I suggested to have a shell script instead of defaults in the python script. > The > > logic of dealing with the defaults in python script is somewhat unobvious (for > > example, think about what would happen if we run `update_copyright.py -u > > https://hg.foo.bar/%60). Also having the script try to push stuff into our > > production repositories when it's run without any arguments doesn't seems like > > the best idea. > > > > The shell script separates these things, makes the script behavior simple and > > obvious and prevents accidental runs targeted at production. > > Why do we support these arguments anyway? I guessed that these exist to be > overridden by the tests, but apparently the tests import the script and call > it's internal APIs directly. And given all the other eyeo-specific logic in the > script, it doesn't seem to be too useful to have just these URL configurable. We support the arguments so that the script would be useful in other contexts, because why not. The other eyeo-specific logic in the script is not great, but we can remove it later. For now it doesn't interfere with other uses so no big deal.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/05 19:24:45, Vasily Kuznetsov wrote: > On 2017/07/05 19:08:25, Sebastian Noack wrote: > > On 2017/07/05 17:38:05, Vasily Kuznetsov wrote: > > > On 2017/07/05 15:46:52, Sebastian Noack wrote: > > > > Is there any purpose of this wrapper script, other than providing > defaults? > > > Note > > > > that you could also provide defaults using argsparse in > update_copyright.py: > > > > > > > > arg_parser.add_argument('-u', '--hg-url', > > > > default='https://hg.adblockplus.org/', > > > > ...) > > > > arg_parser.add_argument('-p', '--push-url', > > > > default='ssh://hg@hg.adblockplus.org/', > > > > ...) > > > > > > I suggested to have a shell script instead of defaults in the python script. > > The > > > logic of dealing with the defaults in python script is somewhat unobvious > (for > > > example, think about what would happen if we run `update_copyright.py -u > > > https://hg.foo.bar/%60). Also having the script try to push stuff into our > > > production repositories when it's run without any arguments doesn't seems > like > > > the best idea. > > > > > > The shell script separates these things, makes the script behavior simple > and > > > obvious and prevents accidental runs targeted at production. > > > > Why do we support these arguments anyway? I guessed that these exist to be > > overridden by the tests, but apparently the tests import the script and call > > it's internal APIs directly. And given all the other eyeo-specific logic in > the > > script, it doesn't seem to be too useful to have just these URL configurable. > > We support the arguments so that the script would be useful in other contexts, > because why not. The other eyeo-specific logic in the script is not great, but > we can remove it later. For now it doesn't interfere with other uses so no big > deal. The wrapper script makes the whole thing less self-contained and less convenient to use. You need to have the shell script and the Python script in the same directory, and then run the shell script from that directory (though this could be improved). Not to mention, that while Python is cross-platform, by using shell scripts we will require a POSIX environment. However, I agree to your concern if we'd handle the defaults in the Python script, that the behavior is inconsistent when -u is provided but not -p. In order to address this, we could make -p mandatory if -u is given. But I also see how this is slightly more complicated than using a wrapper script. But then again, with all the other hard-coded logic, there doesn't seem to be any use case for -u and -p in the first place. So why not just hard-coding these URLs? This would be simple and consistent. And if we want to make this script useful for non-eyeo environments (without having to change the code) in the future, we can still make everything configurable in the same go, later.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/07 14:48:53, Sebastian Noack wrote: > On 2017/07/05 19:24:45, Vasily Kuznetsov wrote: > > On 2017/07/05 19:08:25, Sebastian Noack wrote: > > > On 2017/07/05 17:38:05, Vasily Kuznetsov wrote: > > > > On 2017/07/05 15:46:52, Sebastian Noack wrote: > > > > > Is there any purpose of this wrapper script, other than providing > > defaults? > > > > Note > > > > > that you could also provide defaults using argsparse in > > update_copyright.py: > > > > > > > > > > arg_parser.add_argument('-u', '--hg-url', > > > > > default='https://hg.adblockplus.org/', > > > > > ...) > > > > > arg_parser.add_argument('-p', '--push-url', > > > > > default='ssh://hg@hg.adblockplus.org/', > > > > > ...) > > > > > > > > I suggested to have a shell script instead of defaults in the python > script. > > > The > > > > logic of dealing with the defaults in python script is somewhat unobvious > > (for > > > > example, think about what would happen if we run `update_copyright.py -u > > > > https://hg.foo.bar/%60). Also having the script try to push stuff into our > > > > production repositories when it's run without any arguments doesn't seems > > like > > > > the best idea. > > > > > > > > The shell script separates these things, makes the script behavior simple > > and > > > > obvious and prevents accidental runs targeted at production. > > > > > > Why do we support these arguments anyway? I guessed that these exist to be > > > overridden by the tests, but apparently the tests import the script and call > > > it's internal APIs directly. And given all the other eyeo-specific logic in > > the > > > script, it doesn't seem to be too useful to have just these URL > configurable. > > > > We support the arguments so that the script would be useful in other contexts, > > because why not. The other eyeo-specific logic in the script is not great, but > > we can remove it later. For now it doesn't interfere with other uses so no big > > deal. > > The wrapper script makes the whole thing less self-contained and less convenient > to use. You need to have the shell script and the Python script in the same > directory, and then run the shell script from that directory (though this could > be improved). Not to mention, that while Python is cross-platform, by using > shell scripts we will require a POSIX environment. > > However, I agree to your concern if we'd handle the defaults in the Python > script, that the behavior is inconsistent when -u is provided but not -p. In > order to address this, we could make -p mandatory if -u is given. But I also see > how this is slightly more complicated than using a wrapper script. > > But then again, with all the other hard-coded logic, there doesn't seem to be > any use case for -u and -p in the first place. So why not just hard-coding these > URLs? This would be simple and consistent. And if we want to make this script > useful for non-eyeo environments (without having to change the code) in the > future, we can still make everything configurable in the same go, later. I think we've been discussing this more than the issue deserves. While in principle I think that it's much better to have utilities and API calls, that do one simple thing without mixing levels of abstraction (sort of Unix-philosophy-like approach) and we would definitely benefit from doing more of it, especially in sitescripts repository, this particular script is probably glue code enough that the efforts to separate it into generic and eyeo-specific parts are misplaced. So how about this: we remove the shell script but leave the options with eyeo defaults and make -p mandatory if -u is provided. The options could be useful for testing and if -p is mandatory, things should not get too confusing for the users.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... update-copyright/README.md:8: If a user doesn't have permission to push to a repo, the script will make a local `repo-name.patch` file to submit later. On 2017/07/05 18:55:21, Sebastian Noack wrote: > We always wrap code before 80 characters. While technically, this isn't code > here, the same reasoning applies, i.e. avoiding vertical scrolling and improving > the reading flow when looking at the raw text. Done. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/07 14:48:53, Sebastian Noack wrote: > On 2017/07/05 19:24:45, Vasily Kuznetsov wrote: > > On 2017/07/05 19:08:25, Sebastian Noack wrote: > > > On 2017/07/05 17:38:05, Vasily Kuznetsov wrote: > > > > On 2017/07/05 15:46:52, Sebastian Noack wrote: > > > > > Is there any purpose of this wrapper script, other than providing > > defaults? > > > > Note > > > > > that you could also provide defaults using argsparse in > > update_copyright.py: > > > > > > > > > > arg_parser.add_argument('-u', '--hg-url', > > > > > default='https://hg.adblockplus.org/', > > > > > ...) > > > > > arg_parser.add_argument('-p', '--push-url', > > > > > default='ssh://hg@hg.adblockplus.org/', > > > > > ...) > > > > > > > > I suggested to have a shell script instead of defaults in the python > script. > > > The > > > > logic of dealing with the defaults in python script is somewhat unobvious > > (for > > > > example, think about what would happen if we run `update_copyright.py -u > > > > https://hg.foo.bar/%60). Also having the script try to push stuff into our > > > > production repositories when it's run without any arguments doesn't seems > > like > > > > the best idea. > > > > > > > > The shell script separates these things, makes the script behavior simple > > and > > > > obvious and prevents accidental runs targeted at production. > > > > > > Why do we support these arguments anyway? I guessed that these exist to be > > > overridden by the tests, but apparently the tests import the script and call > > > it's internal APIs directly. And given all the other eyeo-specific logic in > > the > > > script, it doesn't seem to be too useful to have just these URL > configurable. > > > > We support the arguments so that the script would be useful in other contexts, > > because why not. The other eyeo-specific logic in the script is not great, but > > we can remove it later. For now it doesn't interfere with other uses so no big > > deal. > > The wrapper script makes the whole thing less self-contained and less convenient > to use. You need to have the shell script and the Python script in the same > directory, and then run the shell script from that directory (though this could > be improved). Not to mention, that while Python is cross-platform, by using > shell scripts we will require a POSIX environment. > > However, I agree to your concern if we'd handle the defaults in the Python > script, that the behavior is inconsistent when -u is provided but not -p. In > order to address this, we could make -p mandatory if -u is given. But I also see > how this is slightly more complicated than using a wrapper script. > > But then again, with all the other hard-coded logic, there doesn't seem to be > any use case for -u and -p in the first place. So why not just hard-coding these > URLs? This would be simple and consistent. And if we want to make this script > useful for non-eyeo environments (without having to change the code) in the > future, we can still make everything configurable in the same go, later. I don't have a strong opinion either way. Seems like this will only be used once per year, so I'd go for whichever is more straight-forward and easier to use. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/te... File update-copyright/tests/data/sample_file.py (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/te... update-copyright/tests/data/sample_file.py:13: # Copyright (C) 2006-2014 eyeo GmbH On 2017/07/05 15:46:53, Sebastian Noack wrote: > Perhaps we should also test for copyright information in strings (not only in > comments). This is actually a real use case which we have in adblockplusie. > Also I think we should test that non-eyeo copyright information are left alone. > > There are more cases covered by the logic implemented, which we could address > with the tests, but we have the above two cases in our actual code, and > therefore it might be worth to cover them at least. Done. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/to... File update-copyright/tox.ini (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/to... update-copyright/tox.ini:11: flake8-putty On 2017/07/05 18:55:21, Sebastian Noack wrote: > You don't seem to actually use flake8-putty here. You probably have copied this > over from the tox.ini in sitescripts, buildtools or cms, where we use > flake8-putty in order to ignore specific flake8 errors for specific files, in > order to deal with legacy code. This isn't the case here. However, you should > add pep8-naming and flake8-eyeo instead, as we use these flake8 extensions > everywhere. For example see this tox.ini: > https://hg.adblockplus.org/codingtools/file/d95b8d1f3104/patchconv/tox.ini#l9 Done. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... update-copyright/update_copyright.py:1: #!/usr/bin/env python3 On 2017/07/05 15:46:53, Sebastian Noack wrote: > I suppose we should make this script executable (i.e. the mode above should be > 755 instead of 644). Note that if the file is not executable, the shebang (i.e. > #!) here would be useless. > > Once you made the script executable, you might also want to adapt the README to > instruct calling ./update_copyright.py instead of "python3 update_copyright.py", > which as a side effect removes some duplication. Done. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... update-copyright/update_copyright.py:61: print("Error: Couldn't read {}{}".format(dirpath, filename)) On 2017/07/05 15:46:53, Sebastian Noack wrote: > Failing silently (in the original code) was intended here, as this script is > expected to encounter many binary files (e.g. images). If we output an error > message every time, this will generate a huge amount of noise. Done. https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/up... update-copyright/update_copyright.py:118: re.search(r'(Deprecated)', data)) On 2017/07/05 15:46:53, Sebastian Noack wrote: > This regular expression seems incorrect. The parentheses annotate a group. So it > won't only match substrings like "(Deprecated)" but any occurrence of > "Deprecated", storing the result in the 1st group of the match (which is > discarded here). If you want to match parantheses in the input string, you have > to escape them in the pattern: > > re.search(r'\(Deprecated\)', data) > > Anyway, if all you want to do is to find a fixed substring, it would be much > simpler (and less error-prone) to just use the in-operator (instead of regular > expressions): > > '*DEPRECATED*' in data or '(Deprecated)' in data Done.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/07 15:55:26, Vasily Kuznetsov wrote: > So how about this: we remove the shell script but leave the options with eyeo > defaults and make -p mandatory if -u is provided. The options could be useful > for testing and if -p is mandatory, things should not get too confusing for the > users. Fine with me. https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... update-copyright/update_copyright.py:117: len(self.current_url) > 2): The indentation is off by one character here. Example from PEP-8 (https://www.python.org/dev/peps/pep-0008/#indentation) for hanging indentation: if (this_is_one_thing and that_is_another_thing): # Since both conditions are true, we can frobnicate. do_something() Alternatively, you could revert to use a "deprecated" variable again, in order to avoid wrapping here.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... update-copyright/README.md:8: If a user doesn't have permission to push to a repo, the script will make a local `repo-name.patch` file to submit later. On 2017/07/07 15:55:47, rosie wrote: > On 2017/07/05 18:55:21, Sebastian Noack wrote: > > We always wrap code before 80 characters. While technically, this isn't code > > here, the same reasoning applies, i.e. avoiding vertical scrolling and > improving > > the reading flow when looking at the raw text. > > Done. There are three places where the raw text is over 80 characters in the new patch. Two are links that would otherwise be less than 80 characters. The third is an example command. Are these ok, or should they also be wrapped? https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... File update-copyright/run_update_copyright.sh (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/ru... update-copyright/run_update_copyright.sh:1: #!/bin/sh On 2017/07/07 16:05:27, Sebastian Noack wrote: > On 2017/07/07 15:55:26, Vasily Kuznetsov wrote: > > So how about this: we remove the shell script but leave the options with eyeo > > defaults and make -p mandatory if -u is provided. The options could be useful > > for testing and if -p is mandatory, things should not get too confusing for > the > > users. > > Fine with me. Done.
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... update-copyright/README.md:8: If a user doesn't have permission to push to a repo, the script will make a local `repo-name.patch` file to submit later. On 2017/07/07 16:44:00, rosie wrote: > On 2017/07/07 15:55:47, rosie wrote: > > On 2017/07/05 18:55:21, Sebastian Noack wrote: > > > We always wrap code before 80 characters. While technically, this isn't code > > > here, the same reasoning applies, i.e. avoiding vertical scrolling and > > improving > > > the reading flow when looking at the raw text. > > > > Done. > > There are three places where the raw text is over 80 characters in the new > patch. Two are links that would otherwise be less than 80 characters. The third > is an example command. Are these ok, or should they also be wrapped? I guess that is fine. https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... update-copyright/update_copyright.py:117: len(self.current_url) > 2): On 2017/07/07 16:05:27, Sebastian Noack wrote: > The indentation is off by one character here. Example from PEP-8 > (https://www.python.org/dev/peps/pep-0008/#indentation) for hanging indentation: > > if (this_is_one_thing and > that_is_another_thing): > # Since both conditions are true, we can frobnicate. > do_something() > > Alternatively, you could revert to use a "deprecated" variable again, in order > to avoid wrapping here. It seems you missed this comment. https://codereview.adblockplus.org/29459580/diff/29482684/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29482684/update-copyright/up... update-copyright/update_copyright.py:149: 'ssh://hg@hg.adblockplus.org/'): Perhaps it would be worth to handle the defaults manually, in order to avoid duplication. It also seems to make the logic easier to follow (for me at least). arg_parser.add_argument('-u', '--hg-url', help='specify which Mercurial URL site to scrape') arg_parser.add_argument('-p', '--push-url', help='specify where to push the repository') args = arg_parser.parse_args() if args.hg_url and not args.push_url: arg_parser.error('If -u is provided, -p is mandatory') main(args.hg_url or 'https://hg.adblockplus.org/', args.push_url or 'ssh://hg@hg.adblockplus.org/') What do you think?
https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... File update-copyright/README.md (right): https://codereview.adblockplus.org/29459580/diff/29479618/update-copyright/RE... update-copyright/README.md:8: If a user doesn't have permission to push to a repo, the script will make a local `repo-name.patch` file to submit later. On 2017/07/07 16:58:29, Sebastian Noack wrote: > On 2017/07/07 16:44:00, rosie wrote: > > On 2017/07/07 15:55:47, rosie wrote: > > > On 2017/07/05 18:55:21, Sebastian Noack wrote: > > > > We always wrap code before 80 characters. While technically, this isn't > code > > > > here, the same reasoning applies, i.e. avoiding vertical scrolling and > > > improving > > > > the reading flow when looking at the raw text. > > > > > > Done. > > > > There are three places where the raw text is over 80 characters in the new > > patch. Two are links that would otherwise be less than 80 characters. The > third > > is an example command. Are these ok, or should they also be wrapped? > > I guess that is fine. Done.
On 2017/07/05 15:46:54, Sebastian Noack wrote: > Also the title of this review (which I suppose is the same as the commit > message) seems inaccurate. While adding test is mostly what you did here, this > patch is also adding the original update_copyright.py code. Hence a better > review title (and commit message) would be something like "Issue 5250 - Added > copyright update script". What is about this comment? https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... update-copyright/update_copyright.py:117: len(self.current_url) > 2): On 2017/07/07 16:58:29, Sebastian Noack wrote: > On 2017/07/07 16:05:27, Sebastian Noack wrote: > > The indentation is off by one character here. Example from PEP-8 > > (https://www.python.org/dev/peps/pep-0008/#indentation) for hanging > indentation: > > > > if (this_is_one_thing and > > that_is_another_thing): > > # Since both conditions are true, we can frobnicate. > > do_something() > > > > Alternatively, you could revert to use a "deprecated" variable again, in order > > to avoid wrapping here. > > It seems you missed this comment. It seems this comment still hasn't been addressed. https://codereview.adblockplus.org/29459580/diff/29482684/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29482684/update-copyright/up... update-copyright/update_copyright.py:149: 'ssh://hg@hg.adblockplus.org/'): On 2017/07/07 16:58:31, Sebastian Noack wrote: > Perhaps it would be worth to handle the defaults manually, in order to avoid > duplication. It also seems to make the logic easier to follow (for me at least). > > arg_parser.add_argument('-u', '--hg-url', > help='specify which Mercurial URL site to scrape') > arg_parser.add_argument('-p', '--push-url', > help='specify where to push the repository') > args = arg_parser.parse_args() > if args.hg_url and not args.push_url: > arg_parser.error('If -u is provided, -p is mandatory') > main(args.hg_url or 'https://hg.adblockplus.org/', > args.push_url or 'ssh://hg@hg.adblockplus.org/') > > What do you think? What is about this comment?
https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... update-copyright/update_copyright.py:117: len(self.current_url) > 2): On 2017/07/17 10:19:53, Sebastian Noack wrote: > On 2017/07/07 16:58:29, Sebastian Noack wrote: > > On 2017/07/07 16:05:27, Sebastian Noack wrote: > > > The indentation is off by one character here. Example from PEP-8 > > > (https://www.python.org/dev/peps/pep-0008/#indentation) for hanging > > indentation: > > > > > > if (this_is_one_thing and > > > that_is_another_thing): > > > # Since both conditions are true, we can frobnicate. > > > do_something() > > > > > > Alternatively, you could revert to use a "deprecated" variable again, in > order > > > to avoid wrapping here. > > > > It seems you missed this comment. > > It seems this comment still hasn't been addressed. I was getting a flake8 error when using the proper indentation. Instead, I added some extra indentation on the conditional continuation line, as recommended here: https://github.com/PyCQA/pycodestyle/issues/386 Hope that's ok? https://codereview.adblockplus.org/29459580/diff/29482675/update-copyright/up... update-copyright/update_copyright.py:117: len(self.current_url) > 2): On 2017/07/07 16:05:27, Sebastian Noack wrote: > The indentation is off by one character here. Example from PEP-8 > (https://www.python.org/dev/peps/pep-0008/#indentation) for hanging indentation: > > if (this_is_one_thing and > that_is_another_thing): > # Since both conditions are true, we can frobnicate. > do_something() > > Alternatively, you could revert to use a "deprecated" variable again, in order > to avoid wrapping here. Done. https://codereview.adblockplus.org/29459580/diff/29482684/update-copyright/up... File update-copyright/update_copyright.py (right): https://codereview.adblockplus.org/29459580/diff/29482684/update-copyright/up... update-copyright/update_copyright.py:149: 'ssh://hg@hg.adblockplus.org/'): On 2017/07/17 10:19:54, Sebastian Noack wrote: > On 2017/07/07 16:58:31, Sebastian Noack wrote: > > Perhaps it would be worth to handle the defaults manually, in order to avoid > > duplication. It also seems to make the logic easier to follow (for me at > least). > > > > arg_parser.add_argument('-u', '--hg-url', > > help='specify which Mercurial URL site to scrape') > > arg_parser.add_argument('-p', '--push-url', > > help='specify where to push the repository') > > args = arg_parser.parse_args() > > if args.hg_url and not args.push_url: > > arg_parser.error('If -u is provided, -p is mandatory') > > main(args.hg_url or 'https://hg.adblockplus.org/', > > args.push_url or 'ssh://hg@hg.adblockplus.org/') > > > > What do you think? > > What is about this comment? Yeah, makes sense. I like this way better. https://codereview.adblockplus.org/29459580/diff/29482684/update-copyright/up... update-copyright/update_copyright.py:149: 'ssh://hg@hg.adblockplus.org/'): On 2017/07/07 16:58:31, Sebastian Noack wrote: > Perhaps it would be worth to handle the defaults manually, in order to avoid > duplication. It also seems to make the logic easier to follow (for me at least). > > arg_parser.add_argument('-u', '--hg-url', > help='specify which Mercurial URL site to scrape') > arg_parser.add_argument('-p', '--push-url', > help='specify where to push the repository') > args = arg_parser.parse_args() > if args.hg_url and not args.push_url: > arg_parser.error('If -u is provided, -p is mandatory') > main(args.hg_url or 'https://hg.adblockplus.org/', > args.push_url or 'ssh://hg@hg.adblockplus.org/') > > What do you think? Done.
LGTM!
LGTM again for PS9
LGTM again for PS9 |