|
|
Created:
March 14, 2016, 1:23 p.m. by Vasily Kuznetsov Modified:
March 15, 2016, 11:14 p.m. Reviewers:
Sebastian Noack Visibility:
Public. |
DescriptionIssue 3774 - Support multiple mirrors for the Malware Domains List
Patch Set 1 #
Total comments: 12
Patch Set 2 : Addressed review comments #
Total comments: 6
Patch Set 3 : Addressed review comments 2 #
MessagesTotal messages: 7
So in the end there's a hardcoded list of mirrors in the script but it will also read it from the config if it's there. After we put the list into the config, we could remove it from the script.
https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:33: # malwaredomains_mirrors= Example configuration should be documented in .sitescripts.ini.example rather than inside the code. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:37: default_mirrors_list = [ I'd rather wait until sitescripts.ini got updated on the server before pushing this. In particular, if you plan on removing that hard-coded list anyway. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:43: malwaredomains_path = '/files/justdomains.zip' Nit: Please use upper case for constant-like variables. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:51: print >>sys.stderr, '{}: {}'.format(mirror, err) Any output we generate will result into an email sent to the admins, when this scripts runs by cron on the server. And individual mirrors occasionally failing is expected, hence nothing somebody on our end needs to look into. So we should only generate output if all mirrors in the list fail.
Will fix it today. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:33: # malwaredomains_mirrors= On 2016/03/14 13:36:07, Sebastian Noack wrote: > Example configuration should be documented in .sitescripts.ini.example rather > than inside the code. Acknowledged. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:37: default_mirrors_list = [ On 2016/03/14 13:36:07, Sebastian Noack wrote: > I'd rather wait until sitescripts.ini got updated on the server before pushing > this. In particular, if you plan on removing that hard-coded list anyway. I'm cool with this, but I'm not very sure how to update the config, that's why I was thinking that a two-step process would allow us to get it running quicker. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:43: malwaredomains_path = '/files/justdomains.zip' On 2016/03/14 13:36:07, Sebastian Noack wrote: > Nit: Please use upper case for constant-like variables. Acknowledged. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:51: print >>sys.stderr, '{}: {}'.format(mirror, err) On 2016/03/14 13:36:07, Sebastian Noack wrote: > Any output we generate will result into an email sent to the admins, when this > scripts runs by cron on the server. And individual mirrors occasionally failing > is expected, hence nothing somebody on our end needs to look into. So we should > only generate output if all mirrors in the list fail. Acknowledged.
I addressed your comments. I uploaded the patch vs the original submission, hopefully it's the right way to go. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:33: # malwaredomains_mirrors= On 2016/03/14 13:36:07, Sebastian Noack wrote: > Example configuration should be documented in .sitescripts.ini.example rather > than inside the code. Done. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:43: malwaredomains_path = '/files/justdomains.zip' On 2016/03/14 13:36:07, Sebastian Noack wrote: > Nit: Please use upper case for constant-like variables. Done. https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:51: print >>sys.stderr, '{}: {}'.format(mirror, err) On 2016/03/14 13:36:07, Sebastian Noack wrote: > Any output we generate will result into an email sent to the admins, when this > scripts runs by cron on the server. And individual mirrors occasionally failing > is expected, hence nothing somebody on our end needs to look into. So we should > only generate output if all mirrors in the list fail. Done.
https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29338216/diff/29338217/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:37: default_mirrors_list = [ On 2016/03/14 14:18:11, Vasily Kuznetsov wrote: > On 2016/03/14 13:36:07, Sebastian Noack wrote: > > I'd rather wait until sitescripts.ini got updated on the server before pushing > > this. In particular, if you plan on removing that hard-coded list anyway. > > I'm cool with this, but I'm not very sure how to update the config, that's why I > was thinking that a two-step process would allow us to get it running quicker. Matze, Felix or Wladimir can change the config manually on the server. So that shouldn't be a big deal. Feel free to go already asking them to already do so. Filing a follow-up issue to get rid of the hard-coded defaults on the other hand seems to be more hassle. https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (left): https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:60: mirrors_list = filter(None, [mirror.strip() for mirror in mirrors.split()]) strip() and filter() is redundant. If you use split() without arguments this is equivalent to re.split(r"\s+", s.strip()) https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:80: sys.exit(1) You can just pass the error message to sys.exit() and Python will terminate with exit code 1, printing that message to stderr. https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:68: data = try_mirror(mirror, MALWAREDOMAINS_PATH) Since MALWAREDOMAINS_PATH is a global anyway, there is IMO no reason to pass it around.
https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (left): https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:60: mirrors_list = filter(None, [mirror.strip() for mirror in mirrors.split()]) On 2016/03/14 15:40:12, Sebastian Noack wrote: > strip() and filter() is redundant. If you use split() without arguments this is > equivalent to re.split(r"\s+", s.strip()) Yeah, strip is definitely redundant. And I was thinking that something like ' foo bar '.split() would produce ['', 'foo', 'bar', ''] but you're right that it doesn't. Fixed. https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:80: sys.exit(1) On 2016/03/14 15:40:12, Sebastian Noack wrote: > You can just pass the error message to sys.exit() and Python will terminate with > exit code 1, printing that message to stderr. Oh, I didn't know this. Done. https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29338216/diff/29338246/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:68: data = try_mirror(mirror, MALWAREDOMAINS_PATH) On 2016/03/14 15:40:12, Sebastian Noack wrote: > Since MALWAREDOMAINS_PATH is a global anyway, there is IMO no reason to pass it > around. Yeah, I was thinking about it. I guess try_mirror is never intended to be used anywhere other than this module, so it makes sense to just remove its second argument.
LGTM. But please make sure that DEFAULT_MIRRORS_LIST will be removed once it is redundant with the config. |