|
|
Created:
June 20, 2013, 12:06 p.m. by Thomas Greiner Modified:
June 25, 2013, 12:21 p.m. Visibility:
Public. |
DescriptionAdded script for copying external filterlists into existing repositories
Patch Set 1 #
Total comments: 26
Patch Set 2 : #
Total comments: 3
MessagesTotal messages: 6
Not sure whether it's worth it or not as of right now but if there are many external filterlists it would be better to only create one temporary directory per repository instead of one per commit. Also didn't have any actual examples at hand for .sitescripts.example which is why I used the fictional "easyexample" filterlist.
Changed reviewer for this issue.
http://codereview.adblockplus.org/11003016/diff/1/.sitescripts.example File .sitescripts.example (right): http://codereview.adblockplus.org/11003016/diff/1/.sitescripts.example#newcod... .sitescripts.example:126: easyexample_target=easylist:easyexample I would rather keep it more general - this is syncing external files to some repository, not really related to filter lists as such. In other word: [externalFiles] example_source=http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 example_targetrepository=%(root)s/hg/adblockpluschrome example_targetfile=third-party/publicSuffixList.txt http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... File sitescripts/extensions/bin/updateExternalFilterlists.py (right): http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:1: # coding: utf-8 Probably better to have that script under management/bin http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:42: tempdir = mkdtemp(prefix='adblockplus') A more generic specific prefix might be better here, maybe 'external'. Sometimes these temp directories will stay around and it is good to know which script created them. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:44: try: This try statement should immediately follow mkdtemp - if resolveRepositoryPath throws then the temp directory won't be removed. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:52: filename = name + '.txt' The settings should really specify the full file name - the setting name is merely a key to keep the related settings together. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:57: for line in str(data).splitlines(): Why split lines and write/decode each line separately? Please use file.write() to write out all the data at once. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:60: print >>file, line.strip().decode('iso-8859-1') The source file encoding should always be UTF-8, definitely not ISO-8859-1. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:61: file.close(); No semicolon please. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:65: subprocess.Popen(['hg', 'add', '-R', tempdir], stdout=subprocess.PIPE).communicate() No need to add the file explicitly, use hg commit -A http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:68: subprocess.Popen(['hg', 'push', '-R', tempdir], stdout=subprocess.PIPE).communicate() Please use -q command line parameter for all Mercurial calls - we will run this from cron, there should be no output unless something is wrong. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:74: for option in get_config().options('externalFilterlists'): Better: for option, value in get_config().items('externalFiles'): Then you won't need to get the value separately later. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:86: if isinstance(result[name][setting], list): What is that check for? I think we want exactly one value for each setting. And this will never be true from what I can tell. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:98: repositories[re.sub(r'_repository$', '', option)] = value This script shouldn't really combine settings from two different sections - see my proposal in .sitescripts.example.
Applied all suggested changes. Please note that the Python file in this review is still under extensions/bin and it is still called updateExternalFilterlists.py to make it easier to review. It will be moved and renamed to management/bin/updateExternalFiles.py before the commit. http://codereview.adblockplus.org/11003016/diff/1/.sitescripts.example File .sitescripts.example (right): http://codereview.adblockplus.org/11003016/diff/1/.sitescripts.example#newcod... .sitescripts.example:126: easyexample_target=easylist:easyexample On 2013/06/24 12:35:10, Wladimir Palant wrote: > I would rather keep it more general - this is syncing external files to some > repository, not really related to filter lists as such. In other word: > > [externalFiles] > example_source=http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 > example_targetrepository=%(root)s/hg/adblockpluschrome > example_targetfile=third-party/publicSuffixList.txt Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... File sitescripts/extensions/bin/updateExternalFilterlists.py (right): http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:1: # coding: utf-8 On 2013/06/24 12:35:10, Wladimir Palant wrote: > Probably better to have that script under management/bin Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:42: tempdir = mkdtemp(prefix='adblockplus') On 2013/06/24 12:35:10, Wladimir Palant wrote: > A more generic specific prefix might be better here, maybe 'external'. Sometimes > these temp directories will stay around and it is good to know which script > created them. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:44: try: On 2013/06/24 12:35:10, Wladimir Palant wrote: > This try statement should immediately follow mkdtemp - if resolveRepositoryPath > throws then the temp directory won't be removed. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:52: filename = name + '.txt' On 2013/06/24 12:35:10, Wladimir Palant wrote: > The settings should really specify the full file name - the setting name is > merely a key to keep the related settings together. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:57: for line in str(data).splitlines(): On 2013/06/24 12:35:10, Wladimir Palant wrote: > Why split lines and write/decode each line separately? Please use file.write() > to write out all the data at once. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:60: print >>file, line.strip().decode('iso-8859-1') On 2013/06/24 12:35:10, Wladimir Palant wrote: > The source file encoding should always be UTF-8, definitely not ISO-8859-1. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:61: file.close(); On 2013/06/24 12:35:10, Wladimir Palant wrote: > No semicolon please. Done. :) http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:65: subprocess.Popen(['hg', 'add', '-R', tempdir], stdout=subprocess.PIPE).communicate() On 2013/06/24 12:35:10, Wladimir Palant wrote: > No need to add the file explicitly, use hg commit -A Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:68: subprocess.Popen(['hg', 'push', '-R', tempdir], stdout=subprocess.PIPE).communicate() On 2013/06/24 12:35:10, Wladimir Palant wrote: > Please use -q command line parameter for all Mercurial calls - we will run this > from cron, there should be no output unless something is wrong. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:74: for option in get_config().options('externalFilterlists'): On 2013/06/24 12:35:10, Wladimir Palant wrote: > Better: > > for option, value in get_config().items('externalFiles'): > > Then you won't need to get the value separately later. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:86: if isinstance(result[name][setting], list): On 2013/06/24 12:35:10, Wladimir Palant wrote: > What is that check for? I think we want exactly one value for each setting. And > this will never be true from what I can tell. Done. http://codereview.adblockplus.org/11003016/diff/1/sitescripts/extensions/bin/... sitescripts/extensions/bin/updateExternalFilterlists.py:98: repositories[re.sub(r'_repository$', '', option)] = value On 2013/06/24 12:35:10, Wladimir Palant wrote: > This script shouldn't really combine settings from two different sections - see > my proposal in .sitescripts.example. Done.
LGTM, with a nit. http://codereview.adblockplus.org/11003016/diff/7002/sitescripts/extensions/b... File sitescripts/extensions/bin/updateExternalFilterlists.py (right): http://codereview.adblockplus.org/11003016/diff/7002/sitescripts/extensions/b... sitescripts/extensions/bin/updateExternalFilterlists.py:18: import os, subprocess, codecs, urllib, re Seems like the re module is not being used here.
LGTM with the issues below fixed (and the one noted by Felix). http://codereview.adblockplus.org/11003016/diff/7002/sitescripts/extensions/b... File sitescripts/extensions/bin/updateExternalFilterlists.py (right): http://codereview.adblockplus.org/11003016/diff/7002/sitescripts/extensions/b... sitescripts/extensions/bin/updateExternalFilterlists.py:25: for name, setting in settings.iteritems(): Nit: for setting in settings.itervalues() - the key isn't being used. http://codereview.adblockplus.org/11003016/diff/7002/sitescripts/extensions/b... sitescripts/extensions/bin/updateExternalFilterlists.py:42: data = urllib.urlopen(setting['source']).read() You need to add .decode('utf-8') at the end - as is this is being interpreted as ASCII data and will cause issues if the file has non-ASCII content. |