|
|
DescriptionIssue 4767 - Improve error reporting in update_update_manifests
Patch Set 1 #Patch Set 2 : Fix mistake while uploading the patchset #
Total comments: 3
Patch Set 3 : For comments 3 to 5 by jon #
Total comments: 2
Patch Set 4 : For comments 7 and 8 by vasily #Patch Set 5 : For comments 9 and 10 #
Total comments: 2
Patch Set 6 : For comment 13 #
Total comments: 1
Patch Set 7 : For comment 16 #
Total comments: 4
MessagesTotal messages: 27
On 2017/01/08 19:26:04, f.lopez wrote: Hi Paco, Thank you for working on this! Looks like the code works as would be desired however running the automated tests fails due to a style issue addressed on my per line comment. In the future it would be helpful that you install "tox" which can be done with "pip install tox" Prior to uploading code reviews it is helpful to run "tox" from anywhere within the repository directory to run the style and unit tests. Additionally while developing we have a flake8 extension which will allow you to comply with style guidelines more easily. It is located in the coding tools repository and I believe it has installation instructions there as well.
https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... sitescripts/extensions/utils.py:326: print "Error found while parsing xml from %s link" % repo.repositoryName Here you should be using the .format() method for string formatting as per our style guidelines. Guideline text: "Use the + operator when concatenating exactly two strings, use the format() method for more complex string formatting, use the join() method when concatenating pre-existing sequences."
Also could you please add Vasily as a reviewer? or at least CC him? https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... sitescripts/extensions/utils.py:326: print "Error found while parsing xml from %s link" % repo.repositoryName Oops, I also just noticed that you are using double quotes. We are using single quotes so that string literals match the repr
https://codereview.adblockplus.org/29370859/diff/29370940/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29370940/sitescripts/extensi... sitescripts/extensions/utils.py:326: print 'Error found while parsing xml from {0} link'\ Breaking lines with a backslash (`\`) is generally not recommended. It's preferrable to do something like this: print ('bla bla {0}' .format(arg)) Also previous line prints to stderr and this print will print to stderr. Is this intended or should we print to stderr here as well?
https://codereview.adblockplus.org/29370859/diff/29370940/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29370940/sitescripts/extensi... sitescripts/extensions/utils.py:326: print 'Error found while parsing xml from {0} link'\ > Also previous line prints to stderr and this print will print to stderr. Is this > intended or should we print to stderr here as well? Sorry, I meant that this print statement will print to stdout (while we probably want stderr).
https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... sitescripts/extensions/utils.py:326: print "Error found while parsing xml from %s link" % repo.repositoryName This adds some context (i.e. which repo) to the error, which is good. However, in the case of the stacktrace shown in the issue, it is a special case for _getMozillaDownloadLink - meaning that some external service is returning something we cannot parse: ExpatError: mismatched tag: line 6, column 2 I'm guessing that the mozilla service returns something different than we expect. I would like to know what this is, so that we know more about what action to take next time it occurs. For example, maybe they return an xml payload saying <response msg="This service is deprecated, please switch to using http://blabla.mozilla.org">. Perhaps it is even possible to test cover this by mocking the external call to deliver something unexpected like that (ref test_updateManifests.py). However, I think with these considerations we are going outside the scope of 'infrastructure' and deeper into sitescripts territory. @Vasily/Jon is this something you want to adopt and roll out in the near future?
On 2017/01/10 08:58:04, f.nicolaisen wrote: > https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... > File sitescripts/extensions/utils.py (right): > > https://codereview.adblockplus.org/29370859/diff/29370862/sitescripts/extensi... > sitescripts/extensions/utils.py:326: print "Error found while parsing xml from > %s link" % repo.repositoryName > This adds some context (i.e. which repo) to the error, which is good. > > However, in the case of the stacktrace shown in the issue, it is a special case > for _getMozillaDownloadLink - meaning that some external service is returning > something we cannot parse: > > ExpatError: mismatched tag: line 6, column 2 > > I'm guessing that the mozilla service returns something different than we > expect. I would like to know what this is, so that we know more about what > action to take next time it occurs. For example, maybe they return an xml > payload saying <response msg="This service is deprecated, please switch to using > http://blabla.mozilla.org%22>. > > Perhaps it is even possible to test cover this by mocking the external call to > deliver something unexpected like that (ref test_updateManifests.py). > > However, I think with these considerations we are going outside the scope of > 'infrastructure' and deeper into sitescripts territory. @Vasily/Jon is this > something you want to adopt and roll out in the near future? Generally if the code being touched is in a specific module it would be an issue for that specific module, but that is rather inconsequential here I think Paco can handle this if he still wants to. But as per our IRC discussion we should change the implementation to print the xml causing the parsing error. This can be achieved by catching the exception as an instance and then printing the arguments. I will update the ticket to reflect these changes as well. The implementation sort of looks like: except ExpatError as inst: print 'XML parse error for {}'.format(repo.repositoryName) print 'XML: \n\n{}'.format(inst.args) I say sort of looks like because both arguments may not be relevant or the xml may not even be the argument but an xml object is the object. I would have to read the documentation a bit further on that.
Hi Paco, Right direction for handling the parsing errors but there are a few issues with your implementation (see below). I think it would be better to move that code into a separate function similar to `_urlopen` (also see below :). Cheers, Vasly https://codereview.adblockplus.org/29370859/diff/29371579/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29371579/sitescripts/extensi... sitescripts/extensions/utils.py:266: document = dom.parse(_urlopen(url)) Actually we don't need to put the whole block into try/except to catch ExpatError. Expat errors are only created by `dom.parse`. So you can put only this one line into try/except. If you want to re-download and re-parse once (as we dicussed over IRC), it would probably make more sense to move this line into a separate function similar to `_urlopen`. https://codereview.adblockplus.org/29370859/diff/29371579/sitescripts/extensi... sitescripts/extensions/utils.py:276: page = _urlopen(url) This is not a good idea. You're downloading the page again just to put it into the exception text. First of all, it's wasteful, we've already downloaded it on line 266, no need to do it again. Second, it can lead to very misleading error messages if what you download on the second try turns out to be good XML. So as per comment in line 266, it would be better to move parsing and its error reporting into a separate function and there do something like: for i in range(attempts): with _urlopen(url) as page: content = page.read() try: return dom.parseString(content) except err: # We shouldn't raise this immediately since we'll be retrying. exception = Exception('error {} in parsing xml:\n{}\nfrom {}'.format(err, content, url)) raise exception
LGTM
Hi Paco, The error handling for parsing looks good. What about `_urlopen`? Maybe it would be good to also improve that (see below). Ferris, what do you think? Cheers, Vasily https://codereview.adblockplus.org/29370859/diff/29371811/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29371811/sitescripts/extensi... sitescripts/extensions/utils.py:254: error = e It would probably be good to improve the error reporting here as well. Otherwise `urllib.urlopen` throws exceptions like this: IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known and that is also not superhelpful (e.g. it doesn't have the URL).
On 2017/01/13 18:12:11, Vasily Kuznetsov wrote: > Hi Paco, > > The error handling for parsing looks good. What about `_urlopen`? Maybe it would > be good to also improve that (see below). > > Ferris, what do you think? > I'm beyond my python foo at this point, so everything LGTM as long as I think the error msg looks right :D
On 2017/01/13 18:12:11, Vasily Kuznetsov wrote: > Hi Paco, > > The error handling for parsing looks good. What about `_urlopen`? Maybe it would > be good to also improve that (see below). > > Ferris, what do you think? > > Cheers, > Vasily > > https://codereview.adblockplus.org/29370859/diff/29371811/sitescripts/extensi... > File sitescripts/extensions/utils.py (right): > > https://codereview.adblockplus.org/29370859/diff/29371811/sitescripts/extensi... > sitescripts/extensions/utils.py:254: error = e > It would probably be good to improve the error reporting here as well. Otherwise > `urllib.urlopen` throws exceptions like this: > > IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not > known > > and that is also not superhelpful (e.g. it doesn't have the URL). Ok that makes sense, gonna add that change as well
On 2017/01/15 19:32:29, f.lopez wrote: LGTM
On 2017/01/15 21:09:42, Vasily Kuznetsov wrote: > On 2017/01/15 19:32:29, f.lopez wrote: > > LGTM LGTM
Message was sent while issue was closed.
https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... sitescripts/extensions/utils.py:264: page.close() The close() should have gone into a finally block, or even better using the with statement: with contextlib.closing(_urlopen(url)) as page: content = page.read()
Message was sent while issue was closed.
On 2017/01/17 09:33:32, Sebastian Noack wrote: > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > File sitescripts/extensions/utils.py (right): > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > sitescripts/extensions/utils.py:264: page.close() > The close() should have gone into a finally block, or even better using the with > statement: > > with contextlib.closing(_urlopen(url)) as page: > content = page.read() AFAIK, Paco tried it but the return value of urlopen doesn't support context manager protocol so it doesn't work.
Message was sent while issue was closed.
On 2017/01/17 10:02:49, Vasily Kuznetsov wrote: > On 2017/01/17 09:33:32, Sebastian Noack wrote: > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > File sitescripts/extensions/utils.py (right): > > > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > sitescripts/extensions/utils.py:264: page.close() > > The close() should have gone into a finally block, or even better using the > with > > statement: > > > > with contextlib.closing(_urlopen(url)) as page: > > content = page.read() > > AFAIK, Paco tried it but the return value of urlopen doesn't support context > manager protocol so it doesn't work. Ah, sorry, I didn't realize that you're proposing to use contextlib.closing. That would work, but I thought that it's too much overhead (you need to import contextlib, etc). Perhaps you're right though, as the result is more reliable and at least the code itself doesn't seem too "heavy".
Message was sent while issue was closed.
https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... sitescripts/extensions/utils.py:264: page.close() On 2017/01/17 10:02:49, Vasily Kuznetsov wrote: > On 2017/01/17 09:33:32, Sebastian Noack wrote: > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > File sitescripts/extensions/utils.py (right): > > > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > sitescripts/extensions/utils.py:264: page.close() > > The close() should have gone into a finally block, or even better using the > with > > statement: > > > > with contextlib.closing(_urlopen(url)) as page: > > content = page.read() > > AFAIK, Paco tried it but the return value of urlopen doesn't support context > manager protocol so it doesn't work. Yes, in Python 2, the repsonse object returned by urlopen() doesn't implement __enter__/__exit__. So you either have to use the contextlib.closing() decorator before passing it to the with statement or use a traditionally try-finally block. I suggest doing the former, and removing the decorator once we migrate to Python 3. Also, please reply inline to inline comment.
Message was sent while issue was closed.
https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... sitescripts/extensions/utils.py:264: page.close() On 2017/01/17 10:07:00, Vasily Kuznetsov wrote: > On 2017/01/17 10:02:49, Vasily Kuznetsov wrote: > > On 2017/01/17 09:33:32, Sebastian Noack wrote: > > > > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > > File sitescripts/extensions/utils.py (right): > > > > > > > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > > sitescripts/extensions/utils.py:264: page.close() > > > The close() should have gone into a finally block, or even better using the > > with > > > statement: > > > > > > with contextlib.closing(_urlopen(url)) as page: > > > content = page.read() > > > > AFAIK, Paco tried it but the return value of urlopen doesn't support context > > manager protocol so it doesn't work. > > Ah, sorry, I didn't realize that you're proposing to use contextlib.closing. > That would work, but I thought that it's too much overhead (you need to import > contextlib, etc). Perhaps you're right though, as the result is more reliable > and at least the code itself doesn't seem too "heavy". Well, alternatively you could use try-finally. But IMO, the with-statement is nicer. For reference, here is how we did it in jsHydra where we support both Python 2+3: https://hg.adblockplus.org/jshydra/file/tip/abp_rewrite.py#l11
Message was sent while issue was closed.
https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... File sitescripts/extensions/utils.py (right): https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... sitescripts/extensions/utils.py:264: page.close() On 2017/01/17 10:20:15, Sebastian Noack wrote: > On 2017/01/17 10:07:00, Vasily Kuznetsov wrote: > > On 2017/01/17 10:02:49, Vasily Kuznetsov wrote: > > > On 2017/01/17 09:33:32, Sebastian Noack wrote: > > > > > > > > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > > > File sitescripts/extensions/utils.py (right): > > > > > > > > > > > > > > https://codereview.adblockplus.org/29370859/diff/29371813/sitescripts/extensi... > > > > sitescripts/extensions/utils.py:264: page.close() > > > > The close() should have gone into a finally block, or even better using > the > > > with > > > > statement: > > > > > > > > with contextlib.closing(_urlopen(url)) as page: > > > > content = page.read() > > > > > > AFAIK, Paco tried it but the return value of urlopen doesn't support context > > > manager protocol so it doesn't work. > > > > Ah, sorry, I didn't realize that you're proposing to use contextlib.closing. > > That would work, but I thought that it's too much overhead (you need to import > > contextlib, etc). Perhaps you're right though, as the result is more reliable > > and at least the code itself doesn't seem too "heavy". > > Well, alternatively you could use try-finally. But IMO, the with-statement is > nicer. For reference, here is how we did it in jsHydra where we support both > Python 2+3: https://hg.adblockplus.org/jshydra/file/tip/abp_rewrite.py#l11 Yeah, with is nicer, especially having Python 3 in mind. |