|
|
Created:
June 22, 2015, 9:52 p.m. by Sebastian Noack Modified:
July 9, 2015, 1:08 p.m. Reviewers:
Wladimir Palant Visibility:
Public. |
DescriptionIssue 2711 - Refactored ChainedConfigParser, allowing manipulation of list items
Patch Set 1 : #
Total comments: 24
Patch Set 2 : Addressed comments #
Total comments: 2
Patch Set 3 : Don't rely on internal APIs to work around add_section("default") bug #Patch Set 4 : Unimplement add_section as well #Patch Set 5 : Don't convert options to lowercase and decode config files as UTF-8 #
Total comments: 4
Patch Set 6 : Addressed comments #MessagesTotal messages: 14
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:37: have a source attribute serving the same purpose. Extend documentation to mention the new += and -= syntax? https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:48: parser._read(file, filename) Please use public API here: parser.readfp(file, filename) https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:68: old_value = self.get(section, option) This will throw when trying to change an option that doesn't exist yet. While we probably don't want to sweep this condition under the carpet, a meaningful warning would do better IMHO. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:71: value = '%s %s' % (old_value, value) I don't think that we want duplicate values - we are essentially treating the option as a set. So adding a value that is already there shouldn't have an effect. Maybe: existing = old_value.split() value = ' '.join(existing + [v for v in value.split() if v not in existing]) https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:72: elif removal: Nit: no need for elif, can be simply else. But I don't mind leaving it as is for readability. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:73: value = re.sub(r'\b(?:%s)\b\s*' % '|'.join(map(re.escape, value.split())), '', old_value).rstrip() I don't think that regular expressions are the right tool here. How about: blacklist = value.split() value = ' '.join([v for v in old_value.split() if v not in blacklist]) https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:77: def _read(self, file, filename): Is it really a good idea to override private methods? There is relatively little danger of this one changing of course but still. IMHO we should be overriding public API - meaning read() and readfp(). This won't really add much to the complexity. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() I cannot say that I like this hack. How about simply skipping section 'default'? There is lots of magic attached to it, I don't think we want it for metadata files. Would also make sense to have the "inherit" option no longer visible in the result. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:95: self._origin[(section, option)] = filename Ok, we have a problem right here... Consider the following scenario: foo/bar/metadata.chrome defines document_start = foo.js. foo/metadata.chrome defines document_start += bar.js. The resulting value is document_start = foo.js bar.js. Where are foo.js and bar.js supposed to be located? With this change they will both be looked up in foo/ which is wrong. foo.js should definitely be looked up in foo/bar/. Big question is bar.js - would be logical to assume that its origin is foo/. However, that would be inconsistent with -= directives - these use the paths from the original option, not paths relative to the current metadata file. So I think that bar.js should also be located in foo/bar/, additions and removals simply shouldn't change the origin of the option.
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:37: have a source attribute serving the same purpose. On 2015/06/23 09:43:36, Wladimir Palant wrote: > Extend documentation to mention the new += and -= syntax? Done. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:48: parser._read(file, filename) On 2015/06/23 09:43:35, Wladimir Palant wrote: > Please use public API here: > > parser.readfp(file, filename) This code has been gone while addressing the comment below. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:68: old_value = self.get(section, option) On 2015/06/23 09:43:36, Wladimir Palant wrote: > This will throw when trying to change an option that doesn't exist yet. While we > probably don't want to sweep this condition under the carpet, a meaningful > warning would do better IMHO. Done. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:71: value = '%s %s' % (old_value, value) On 2015/06/23 09:43:35, Wladimir Palant wrote: > I don't think that we want duplicate values - we are essentially treating the > option as a set. So adding a value that is already there shouldn't have an > effect. Maybe: > > existing = old_value.split() > value = ' '.join(existing + [v for v in value.split() if v not in existing]) Done. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:72: elif removal: On 2015/06/23 09:43:35, Wladimir Palant wrote: > Nit: no need for elif, can be simply else. But I don't mind leaving it as is for > readability. Done. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:73: value = re.sub(r'\b(?:%s)\b\s*' % '|'.join(map(re.escape, value.split())), '', old_value).rstrip() On 2015/06/23 09:43:35, Wladimir Palant wrote: > I don't think that regular expressions are the right tool here. How about: > > blacklist = value.split() > value = ' '.join([v for v in old_value.split() if v not in blacklist]) Done. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:77: def _read(self, file, filename): On 2015/06/23 09:43:35, Wladimir Palant wrote: > Is it really a good idea to override private methods? There is relatively little > danger of this one changing of course but still. IMHO we should be overriding > public API - meaning read() and readfp(). This won't really add much to the > complexity. Overriding read() and readfp() would require quite some additional complexity and duplication. I went for only supporting read() now, which doesn't seem to be too bad. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() On 2015/06/23 09:43:36, Wladimir Palant wrote: > I cannot say that I like this hack. How about simply skipping section 'default'? > There is lots of magic attached to it, I don't think we want it for metadata > files. Would also make sense to have the "inherit" option no longer visible in > the result. "default" != "DEFAULT". The latter is handled specially and is already ignored by our code. However, due to a bug, add_section() checks for "DEFAULT" case-insensitive, and prevents adding a section called "default", which is just handled as a regular section by the rest of the API. https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:95: self._origin[(section, option)] = filename On 2015/06/23 09:43:35, Wladimir Palant wrote: > Ok, we have a problem right here... Consider the following scenario: > > foo/bar/metadata.chrome defines document_start = foo.js. foo/metadata.chrome > defines document_start += bar.js. The resulting value is document_start = foo.js > bar.js. Where are foo.js and bar.js supposed to be located? > > With this change they will both be looked up in foo/ which is wrong. foo.js > should definitely be looked up in foo/bar/. Big question is bar.js - would be > logical to assume that its origin is foo/. However, that would be inconsistent > with -= directives - these use the paths from the original option, not paths > relative to the current metadata file. So I think that bar.js should also be > located in foo/bar/, additions and removals simply shouldn't change the origin > of the option. I first thought that we can simply rely on the source of the addition/removal as it is merely a short hand syntax for repeating the pre-existing items. But you are right. Ideally, we would resolve the filenames relative to the location of the metadata file they occurred in, before applying the diff. However, then we have to assume that the option lists only filenames. So without adding vague assumptions or too much complexity, it seems that the best we can do is indeed preserving the source of the original option when applying diffs. :(
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() On 2015/06/24 08:51:09, Sebastian Noack wrote: > "default" != "DEFAULT". The latter is handled specially and is already ignored > by our code. However, due to a bug, add_section() checks for "DEFAULT" > case-insensitive, and prevents adding a section called "default", which is just > handled as a regular section by the rest of the API. I see. Still, should we access private variables in order to work around Python bugs? How about throwing a more meaningful error here: raise ValueError("SafeConfigParser.add_section() failed to add section %s, please use a different section name" % section) https://codereview.adblockplus.org/29319007/diff/29321057/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29321057/chainedconfigparser... chainedconfigparser.py:81: except ConfigParser.NoOptionError: What about ConfigParser.NoSectionError?
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() On 2015/06/25 14:11:40, Wladimir Palant wrote: > On 2015/06/24 08:51:09, Sebastian Noack wrote: > > "default" != "DEFAULT". The latter is handled specially and is already ignored > > by our code. However, due to a bug, add_section() checks for "DEFAULT" > > case-insensitive, and prevents adding a section called "default", which is > just > > handled as a regular section by the rest of the API. > > I see. Still, should we access private variables in order to work around Python > bugs? How about throwing a more meaningful error here: raise > ValueError("SafeConfigParser.add_section() failed to add section %s, please use > a different section name" % section) The except block will only be reached in Python versions that have this bug. So I don't see any problem with using internal APIs to workaround the bug here. https://codereview.adblockplus.org/29319007/diff/29321057/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29321057/chainedconfigparser... chainedconfigparser.py:81: except ConfigParser.NoOptionError: On 2015/06/25 14:11:40, Wladimir Palant wrote: > What about ConfigParser.NoSectionError? We already make sure to create the section if it doesn't exist before this function is called.
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() On 2015/06/25 16:06:28, Sebastian Noack wrote: > The except block will only be reached in Python versions that have this bug. That is: every Python version we support. And I'm pretty sure that there is a reason why this is documented rather than fixed - it's a compatibility constraint. There will never be a Python 2.x version without that bug. > So > I don't see any problem with using internal APIs to workaround the bug here. The problem is that we might make it worse, e.g. by producing an unexpected data structure that will fail further down without an obvious connection to this piece of code. As I said, why not just produce a meaningful error message instead of messing with internal data structures?
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() On 2015/06/25 16:12:28, Wladimir Palant wrote: > On 2015/06/25 16:06:28, Sebastian Noack wrote: > > The except block will only be reached in Python versions that have this bug. > > That is: every Python version we support. And I'm pretty sure that there is a > reason why this is documented rather than fixed - it's a compatibility > constraint. There will never be a Python 2.x version without that bug. > > > So > > I don't see any problem with using internal APIs to workaround the bug here. > > The problem is that we might make it worse, e.g. by producing an unexpected data > structure that will fail further down without an obvious connection to this > piece of code. As I said, why not just produce a meaningful error message > instead of messing with internal data structures? In case I didn't stress this enough, this is a bug! The reason they didn't fix it in Python 2.7 could be that they are in maintenance mode for years only, fixing security issues, moving all improvements to Python 3 where this is fixed. Another reason could be that they don't want to break the behavior of existing APIs. And I agree there is no reason to have ChainedConfigParser behave differently. But my fix doesn't. This behavior is only implemented for the set() method, which isn't even implemented for external use by ChainedConfigParser. The code here merely assist to implement the read() method, which correctly parsed sections called "default" also when using SafeConfigParser. The line here is equivalent to: SafeConfigParser.readfp(self, StringIO("[%s]" % section))
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() On 2015/06/25 23:05:31, Sebastian Noack wrote: > In case I didn't stress this enough, this is a bug! It doesn't matter what you call it - it's documented meaning that it is a compatibility constraint. This behavior won't be fixed, not that it is really important for us. > And I agree there is no reason to have ChainedConfigParser behave > differently. But my fix doesn't. Sure, at least not in the current Python version. You are making huge assumptions here, for no good reason. > The line here > is equivalent to: SafeConfigParser.readfp(self, StringIO("[%s]" % section)) If that works then why not just use that line and get rid of private APIs? I don't think we are terribly concerned about performance for this extremely unlikely scenario.
https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29319014/chainedconfigparser... chainedconfigparser.py:93: self._sections[section] = self._dict() So your concerns here are only about using internal APIs. Then I got you wrong in my previous reply. Fine with me using SafeConfigParser.readf() then instead. I just realized last night that we could actually do that.
LGTM
I found two regressions, fixed with the new patch set: 1. Config files weren't decoded as UTF-8 anymore, crashing build.py when processing the contributors for the Firefox extension 2. Option names were converted to lower case again (default behavior of ConfigParser), messing up some filenames (e.g. firstRun.html)
https://codereview.adblockplus.org/29319007/diff/29321431/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29321431/chainedconfigparser... chainedconfigparser.py:47: As opposed to ChainedConfigParser, files are decoded as UTF-8 while reading. Did you mean SafeConfigParser? https://codereview.adblockplus.org/29319007/diff/29321431/chainedconfigparser... chainedconfigparser.py:48: Also, ChainedConfigParser data is read-only and the options are case-sensitive. Nit: "case-sensitive by default" (it's configurable now). Also, please break at 80 characters.
https://codereview.adblockplus.org/29319007/diff/29321431/chainedconfigparser.py File chainedconfigparser.py (right): https://codereview.adblockplus.org/29319007/diff/29321431/chainedconfigparser... chainedconfigparser.py:47: As opposed to ChainedConfigParser, files are decoded as UTF-8 while reading. On 2015/07/07 14:53:17, Wladimir Palant wrote: > Did you mean SafeConfigParser? Yes, corrected. https://codereview.adblockplus.org/29319007/diff/29321431/chainedconfigparser... chainedconfigparser.py:48: Also, ChainedConfigParser data is read-only and the options are case-sensitive. On 2015/07/07 14:53:17, Wladimir Palant wrote: > Nit: "case-sensitive by default" (it's configurable now). Not really. There were some code assumptions, relying on optionxform beeing noop. But while thinking about it, it makes sense to remove these assumptions and configure optionxform externally rather than documenting the non-standard behavior here. Did that. > Also, please break at 80 characters. Done.
LGTM |