|
|
Created:
Jan. 10, 2017, 6:08 p.m. by Vasily Kuznetsov Modified:
Jan. 23, 2017, 9:28 a.m. Visibility:
Public. |
DescriptionFixes 4784 - Improve error reporting in updateMalwareDomainsList and add tests
Repository: https://hg.adblockplus.org/sitescripts
Base revision: e4f054a8b28b
Patch Set 1 #
Total comments: 12
Patch Set 2 : Close streams more carefully #
Total comments: 4
Patch Set 3 : Improve the error handling flow in try_mirror #
MessagesTotal messages: 17
https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (left): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:69: zip = zipfile.ZipFile(StringIO(data), 'r') These three lines are an unrelated change. Flake8 started complaining about A302 here (not sure how it passed before) so I fixed it.
bump
On 2017/01/16 18:38:19, Vasily Kuznetsov wrote: > bump As discussed on IRC, I leave the first review round up to Jon, and will do a final review myself. That is how we have onboarded other reviewers before. Also please don't post comments like "bump", but ping the reviewers on IRC instead, if you feel the review isn't moving forward quick enough.
Looks pretty ok, mostly see the comment about using the exception attribute reason as a part of the error message. Also the test seems to just be ensuring the domains are read correctly since the urllib is mocked? https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:44: error_message = 'Failed to fetch {}: {}'.format(url, exc) Perhaps it is more useful to you the reason attribute here rather than the string representation of the exception as per the urllib2 docs for URLErrors
Thanks for the comment, Jon. After looking at it in more detail I think that I would like to keep it as is (see below). Let me know if you think it can still be improved. https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:44: error_message = 'Failed to fetch {}: {}'.format(url, exc) On 2017/01/18 17:04:48, Jon Sonesen wrote: > Perhaps it is more useful to you the reason attribute here rather than the > string representation of the exception as per the urllib2 docs for URLErrors We might also catch subclasses of URLError here. Basically it means HTTPError, which also has error code in .code attribute. I would like to include the error code into error_message and just including exc.reason would not do it. The __str__ methods of URLError and HTTPError do include all relevant information, so with my approach I don't need to worry about what is the exact exception class. The disadvantage here is that those two __str__ methods are not consistent in terms of how the output is formatted, but I think we can live with it. So all in all I'd prefer to keep this line the way it is if that's ok.
On 2017/01/18 18:46:35, Vasily Kuznetsov wrote: > Thanks for the comment, Jon. After looking at it in more detail I think that I > would like to keep it as is (see below). Let me know if you think it can still > be improved. > > https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... > File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): > > https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... > sitescripts/subscriptions/bin/updateMalwareDomainsList.py:44: error_message = > 'Failed to fetch {}: {}'.format(url, exc) > On 2017/01/18 17:04:48, Jon Sonesen wrote: > > Perhaps it is more useful to you the reason attribute here rather than the > > string representation of the exception as per the urllib2 docs for URLErrors > > We might also catch subclasses of URLError here. Basically it means HTTPError, > which also has error code in .code attribute. I would like to include the error > code into error_message and just including exc.reason would not do it. The > __str__ methods of URLError and HTTPError do include all relevant information, > so with my approach I don't need to worry about what is the exact exception > class. The disadvantage here is that those two __str__ methods are not > consistent in terms of how the output is formatted, but I think we can live with > it. So all in all I'd prefer to keep this line the way it is if that's ok. Hey, thanks for the response. Also, I agree with you here let's just leave it as it is.
https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:40: url = mirror + MALWAREDOMAINS_PATH This line doesn't seem to belong in to the try-block (there is no way that it can potentially throw a URLError. https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:41: response = urllib2.urlopen(url) I think we should explicitly close the response. https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:43: except urllib2.URLError as exc: Note that HTTPError (a subclass of URLError) is also a file-like object that should be closed.
https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:41: response = urllib2.urlopen(url) On 2017/01/19 09:01:29, Sebastian Noack wrote: > I think we should explicitly close the response. Could this be resolved by using a 'with' statement to open it. This way it closes even in the event of an exception?
On 2017/01/19 09:31:24, Jon Sonesen wrote: > https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... > File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): > > https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... > sitescripts/subscriptions/bin/updateMalwareDomainsList.py:41: response = > urllib2.urlopen(url) > On 2017/01/19 09:01:29, Sebastian Noack wrote: > > I think we should explicitly close the response. > > Could this be resolved by using a 'with' statement to open it. This way it > closes even in the event of an exception? There is also the fact that when the reference count reaches zero for the instance, python's garbage collection will close the connection since __del__ is 'self.close()' but I agree that it is best to explicitly close the resource.
https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:41: response = urllib2.urlopen(url) On 2017/01/19 09:31:24, Jon Sonesen wrote: > On 2017/01/19 09:01:29, Sebastian Noack wrote: > > I think we should explicitly close the response. > > Could this be resolved by using a 'with' statement to open it. This way it > closes even in the event of an exception? Yes, this should be implemented either using try-finally or using a with-statement, so that it gets closed no matter what. However, in Python 2, the response returned by urlopen() doesn't implement __enter__/__exit__. So in order to use the with-statement here, the response would have to be wrapped with contextlib.closing(). Moreover, if an HTTPError is raised (see below), that exception is a file-like object itself, which needs to be closed as well. On 2017/01/19 09:33:53, Jon Sonesen wrote: > There is also the fact that when the reference count reaches zero for the > instance, python's garbage collection will close the connection since __del__ is > 'self.close()' but I agree that it is best to explicitly close the resource. Relying on that behavior is bad practice. First of all this is an implementation detail of CPython, and as such it should not be relied on. If possible we should target the Python language, not a specific implementation. More importantly, __del__ methods are resolved by the garbage collector in CPython, and therefore closing the response would be indefinitely delayed, so that most likely the response will still be open while we already send subsequent requests.
https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:41: response = urllib2.urlopen(url) On 2017/01/19 09:48:18, Sebastian Noack wrote: > On 2017/01/19 09:31:24, Jon Sonesen wrote: > > On 2017/01/19 09:01:29, Sebastian Noack wrote: > > > I think we should explicitly close the response. > > > > Could this be resolved by using a 'with' statement to open it. This way it > > closes even in the event of an exception? > > Yes, this should be implemented either using try-finally or using a > with-statement, so that it gets closed no matter what. However, in Python 2, the > response returned by urlopen() doesn't implement __enter__/__exit__. So in order > to use the with-statement here, the response would have to be wrapped with > contextlib.closing(). Moreover, if an HTTPError is raised (see below), that > exception is a file-like object itself, which needs to be closed as well. > > On 2017/01/19 09:33:53, Jon Sonesen wrote: > > There is also the fact that when the reference count reaches zero for the > > instance, python's garbage collection will close the connection since __del__ > is > > 'self.close()' but I agree that it is best to explicitly close the resource. > > Relying on that behavior is bad practice. First of all this is an implementation > detail of CPython, and as such it should not be relied on. If possible we should > target the Python language, not a specific implementation. More importantly, > __del__ methods are resolved by the garbage collector in CPython, and therefore > closing the response would be indefinitely delayed, so that most likely the > response will still be open while we already send subsequent requests. Thanks for elaborating on those points :)
Thanks for all the comments. Now all the streams are carefully closed. https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:40: url = mirror + MALWAREDOMAINS_PATH On 2017/01/19 09:01:29, Sebastian Noack wrote: > This line doesn't seem to belong in to the try-block (there is no way that it > can potentially throw a URLError. Done. https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:41: response = urllib2.urlopen(url) On 2017/01/19 09:01:29, Sebastian Noack wrote: > I think we should explicitly close the response. Done. https://codereview.adblockplus.org/29370984/diff/29370985/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:43: except urllib2.URLError as exc: On 2017/01/19 09:01:29, Sebastian Noack wrote: > Note that HTTPError (a subclass of URLError) is also a file-like object that > should be closed. Done. https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:46: if hasattr(exc, 'close'): I tried to do it with two separate except clauses but that creates too much repetition. How about a duck typish check instead! https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... File sitescripts/subscriptions/test/test_updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... sitescripts/subscriptions/test/test_updateMalwareDomainsList.py:64: raise urllib2.HTTPError('Bad', '42', 'No good', [], None) Jon inspired me to check proper reporting of both HTTPError and URLError more thoughroughly.
https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:46: if hasattr(exc, 'close'): On 2017/01/19 16:34:08, Vasily Kuznetsov wrote: > I tried to do it with two separate except clauses but that creates too much > repetition. How about a duck typish check instead! I don't see how that creates any duplication. try: ... except urllib2.HTTPError as e: e.close() except urllib2.URLError as e: pass return 'Failed to fetch {}: {}'.format(url, e), None
https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... File sitescripts/subscriptions/bin/updateMalwareDomainsList.py (right): https://codereview.adblockplus.org/29370984/diff/29372682/sitescripts/subscri... sitescripts/subscriptions/bin/updateMalwareDomainsList.py:46: if hasattr(exc, 'close'): On 2017/01/19 17:13:33, Sebastian Noack wrote: > On 2017/01/19 16:34:08, Vasily Kuznetsov wrote: > > I tried to do it with two separate except clauses but that creates too much > > repetition. How about a duck typish check instead! > > I don't see how that creates any duplication. > > try: > ... > except urllib2.HTTPError as e: > e.close() > except urllib2.URLError as e: > pass > return 'Failed to fetch {}: {}'.format(url, e), None Yeah, this is better. Thanks.
LGTM
On 2017/01/20 14:09:48, Sebastian Noack wrote: > LGTM LGTM |