|
|
Created:
May 29, 2018, 4:54 p.m. by rhowell Modified:
July 6, 2018, 5:08 p.m. Base URL:
https://hg.adblockplus.org/python-abp/ Visibility:
Public. |
DescriptionIssue 6701 - Implement CSP support in python-abp
Patch Set 1 #
Total comments: 7
Patch Set 2 : Remove validity regex and extraneous comment #Patch Set 3 : Check negation first, and added some tests #
Total comments: 6
Patch Set 4 : Condense the tests, properly handle ~csp #Patch Set 5 : Removing handling and testing for ~csp #
MessagesTotal messages: 17
On 2018/05/29 16:54:36, rhowell wrote: Hey Vasily, Wanted to get your feedback so far. Am I missing some functionality that should be using the INVALID_CSP_REGEXP? Also, does the test look ok, especially the format of the URL (||foo.com^) and the options ((OPT.CSP, "script-src 'self' * 'unsafe-inline'"))? Thanks! Rosie
Hi Rosie, The start looks good. I think you would need to add a test for the "~csp" syntax (whatever the right format is, see below) and make that work but otherwise this seems to be pretty much it. Regarding INVALID_CSP_REGEXP, at the moment we're not checking the validity of the values of other options at the moment, so my first intuition is that for consistency it makes sense to not do this checking in the parser. Cheers, Vasily https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.py File abp/filters/parser.py (right): https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.... abp/filters/parser.py:182: if option.startswith('~'): If we want to be able to handle the form "~csp=xxxx" this line should be moved to before line 180. I'm not sure if we do, or what such form means ("~csp" seems sufficient to negate CSP resource type) but I might be missing something so better check with Dave, who wrote the code in abpcore (see https://hg.adblockplus.org/adblockpluscore/rev/f9a48feb592c#l2.84). https://codereview.adblockplus.org/29793573/diff/29793574/tests/test_parser.py File tests/test_parser.py (right): https://codereview.adblockplus.org/29793573/diff/29793574/tests/test_parser.p... tests/test_parser.py:113: # from pdb import set_trace; set_trace() ;)
I talked with Dave about the ~csp=xxxx option, see below for comments. Thanks! :) https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.py File abp/filters/parser.py (right): https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.... abp/filters/parser.py:152: # Regular expression that matches an invalid Content Security Policy Removing this, since we aren't currently checking validity. https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.... abp/filters/parser.py:182: if option.startswith('~'): On 2018/05/30 19:49:26, Vasily Kuznetsov wrote: > If we want to be able to handle the form "~csp=xxxx" this line should be moved > to before line 180. I'm not sure if we do, or what such form means ("~csp" seems > sufficient to negate CSP resource type) but I might be missing something so > better check with Dave, who wrote the code in abpcore (see > https://hg.adblockplus.org/adblockpluscore/rev/f9a48feb592c#l2.84). I talked to Dave, and he said that ~csp=xxxx and ~csp are both effectively ignored, and not expected to be used, but the filters are considered valid. Do you think we should still implement this functionality? https://codereview.adblockplus.org/29793573/diff/29793574/tests/test_parser.py File tests/test_parser.py (right): https://codereview.adblockplus.org/29793573/diff/29793574/tests/test_parser.p... tests/test_parser.py:113: # from pdb import set_trace; set_trace() On 2018/05/30 19:49:26, Vasily Kuznetsov wrote: > ;) oops! Done. :)
Thanks for clearing this up with Dave, Rosie, see my comment below. https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.py File abp/filters/parser.py (right): https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.... abp/filters/parser.py:182: if option.startswith('~'): On 2018/06/07 18:16:55, rhowell wrote: > On 2018/05/30 19:49:26, Vasily Kuznetsov wrote: > > If we want to be able to handle the form "~csp=xxxx" this line should be moved > > to before line 180. I'm not sure if we do, or what such form means ("~csp" > seems > > sufficient to negate CSP resource type) but I might be missing something so > > better check with Dave, who wrote the code in abpcore (see > > https://hg.adblockplus.org/adblockpluscore/rev/f9a48feb592c#l2.84). > > I talked to Dave, and he said that ~csp=xxxx and ~csp are both effectively > ignored, and not expected to be used, but the filters are considered valid. Do > you think we should still implement this functionality? Let's switch the ifs -- this way both ~csp=xxx and ~csp will produce ('csp', False) in the return value, which seems like the best way to interpret it. What to do with this information will be the worry of the user (user of the library in this case), but at least here we produce the output that makes sense (I think returning ('~csp', 'xxx') would be error-prone).
I added tests similar to what Dave was testing in adblockpluscore. Let me know if it's overkill; if so, we can remove some of them! :) https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.py File abp/filters/parser.py (right): https://codereview.adblockplus.org/29793573/diff/29793574/abp/filters/parser.... abp/filters/parser.py:182: if option.startswith('~'): On 2018/06/07 18:28:31, Vasily Kuznetsov wrote: > On 2018/06/07 18:16:55, rhowell wrote: > > On 2018/05/30 19:49:26, Vasily Kuznetsov wrote: > > > If we want to be able to handle the form "~csp=xxxx" this line should be > moved > > > to before line 180. I'm not sure if we do, or what such form means ("~csp" > > seems > > > sufficient to negate CSP resource type) but I might be missing something so > > > better check with Dave, who wrote the code in abpcore (see > > > https://hg.adblockplus.org/adblockpluscore/rev/f9a48feb592c#l2.84). > > > > I talked to Dave, and he said that ~csp=xxxx and ~csp are both effectively > > ignored, and not expected to be used, but the filters are considered valid. Do > > you think we should still implement this functionality? > > Let's switch the ifs -- this way both ~csp=xxx and ~csp will produce ('csp', > False) in the return value, which seems like the best way to interpret it. What > to do with this information will be the worry of the user (user of the library > in this case), but at least here we produce the output that makes sense (I think > returning ('~csp', 'xxx') would be error-prone). Done.
Hi Rosie, I think it's bit of an overkill but not too much. Maybe we could have 3 CSP tests in total (instead of 4) and still cover everything (see my comments inline). Also I think that the way ~csp is now handled is not quite right. "~cps=xxx" should produce ('csp': False), not ('csp=xxx': False) -- this should be easy to fix though. Cheers, Vasily https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.py File tests/test_parser.py (right): https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.p... tests/test_parser.py:77: 'bla$match-case,csp=first csp,script,other,domain=foo.com,sitekey=foo': { It seems like this test does more or less the sane as the previous one. That one has a more tricky CSP option, this one has other options -- maybe we could combine them into one test. https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.p... tests/test_parser.py:94: ('csp=csp', False), Hm, I think this should be ('csp', False). I guess the simple swapping trick in `parser.py` is not sufficient and we'll need to be a bit more clever about this. https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.p... tests/test_parser.py:101: '@@bla$script,other,domain=foo.com|~bar.foo.com,csp=c s p': { This one adds testing, of a CSP option in an exception rule. I guess it's useful too, but perhaps it could be combined with the test of line 77 or the test of line 101. All in all, 2 tests for csp=xxx and one for ~csp should probably be enough.
Hey Vasily, Thanks for the feedback. In parser.py, it feels a little hacky to hardcode "return 'csp', False", but it also seemed like the most simple, accurate, and straightforward way. I'm open to suggestions, if there's a more elegant way to parse that properly. Thanks! :) Rosie https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.py File tests/test_parser.py (right): https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.p... tests/test_parser.py:77: 'bla$match-case,csp=first csp,script,other,domain=foo.com,sitekey=foo': { On 2018/06/08 16:41:48, Vasily Kuznetsov wrote: > It seems like this test does more or less the sane as the previous one. That one > has a more tricky CSP option, this one has other options -- maybe we could > combine them into one test. Done. https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.p... tests/test_parser.py:94: ('csp=csp', False), On 2018/06/08 16:41:48, Vasily Kuznetsov wrote: > Hm, I think this should be ('csp', False). I guess the simple swapping trick in > `parser.py` is not sufficient and we'll need to be a bit more clever about this. Done. https://codereview.adblockplus.org/29793573/diff/29801678/tests/test_parser.p... tests/test_parser.py:101: '@@bla$script,other,domain=foo.com|~bar.foo.com,csp=c s p': { On 2018/06/08 16:41:48, Vasily Kuznetsov wrote: > This one adds testing, of a CSP option in an exception rule. I guess it's useful > too, but perhaps it could be combined with the test of line 77 or the test of > line 101. All in all, 2 tests for csp=xxx and one for ~csp should probably be > enough. I already combined the first two csp tests, so now we have one test for csp block, one for csp allow, and one for ~csp. Good enough? Or, I could combine the two csp tests, but then it would only test the exception.
On 2018/06/12 21:13:18, rhowell wrote: > Hey Vasily, > > Thanks for the feedback. In parser.py, it feels a little hacky to hardcode > "return 'csp', False", but it also seemed like the most simple, accurate, and > straightforward way. I'm open to suggestions, if there's a more elegant way to > parse that properly. > > Thanks! :) > Rosie > I think we can make this apply to all options. Basically ~xxx=yyy would always become ('xxx', False). It doesn't seem like this would break any of the existing use cases and it would be less hacky. Another option, which, TBH, I prefer, is to just outlaw this and only allow ~csp. Since there's no use case for ~csp=yyy, it might be preferrable, given that we won't break any real use cases... but we should then probably sync up with WebExt team on this.
On 2018/06/13 14:01:57, Vasily Kuznetsov wrote: > I think we can make this apply to all options. Basically ~xxx=yyy would always > become ('xxx', False). It doesn't seem like this would break any of the existing > use cases and it would be less hacky. Another option, which, TBH, I prefer, is > to just outlaw this and only allow ~csp. Since there's no use case for ~csp=yyy, > it might be preferrable, given that we won't break any real use cases... but we > should then probably sync up with WebExt team on this. I was asked to comment on this. With the original filter parser, the filter "||example.com^$~csp=foo" is equivalent to "||example.com^" or "||example.com^$some-unknown-option". In other words, if ~csp (with tilde) is given it is treated as an unknown filter option and is ignored.
On 2018/06/13 18:07:18, Sebastian Noack wrote: > On 2018/06/13 14:01:57, Vasily Kuznetsov wrote: > > I think we can make this apply to all options. Basically ~xxx=yyy would always > > become ('xxx', False). It doesn't seem like this would break any of the > existing > > use cases and it would be less hacky. Another option, which, TBH, I prefer, is > > to just outlaw this and only allow ~csp. Since there's no use case for > ~csp=yyy, > > it might be preferrable, given that we won't break any real use cases... but > we > > should then probably sync up with WebExt team on this. > > I was asked to comment on this. With the original filter parser, the filter > "||example.com^$~csp=foo" is equivalent to "||example.com^" or > "||example.com^$some-unknown-option". In other words, if ~csp (with tilde) is > given it is treated as an unknown filter option and is ignored. Thanks for chiming in, Sebastian, having an extension developer here is helpful. I agree that treating "~xxx=yyy" as "some-unknown-option" is wrong. What I'm proposing (perhaps not being clear enough) is to either treat all "~xxx=yyy" as equivalent to "~xxx" (for all values of xxx) or to use a more strict approach that would consider "~xxx=yyy" an error, and would require either "~xxx" or "xxx=yyy" (basically if the option starts with a tilde, it's not allowed to have a value after "="). What we currently have in PS4 is implementation of the former just for "csp", but I think we might as well do it for everything. Cheers, Vasily
On 2018/06/13 22:21:13, Vasily Kuznetsov wrote: > I agree that treating "~xxx=yyy" as "some-unknown-option" is wrong. What I'm > proposing (perhaps not being clear enough) is to either treat all "~xxx=yyy" as > equivalent to "~xxx" (for all values of xxx) or to use a more strict approach > that would consider "~xxx=yyy" an error, and would require either "~xxx" or > "xxx=yyy" (basically if the option starts with a tilde, it's not allowed to have > a value after "="). > > What we currently have in PS4 is implementation of the former just for "csp", > but I think we might as well do it for everything. Why is treating "~csp" as an unknown option wrong? How is it different from "xcsp"? Moreover, the source of truth surely are the semantics of the original implementation (in Adblock Plus).
On 2018/06/14 02:36:49, Sebastian Noack wrote: > On 2018/06/13 22:21:13, Vasily Kuznetsov wrote: > > I agree that treating "~xxx=yyy" as "some-unknown-option" is wrong. What I'm > > proposing (perhaps not being clear enough) is to either treat all "~xxx=yyy" > as > > equivalent to "~xxx" (for all values of xxx) or to use a more strict approach > > that would consider "~xxx=yyy" an error, and would require either "~xxx" or > > "xxx=yyy" (basically if the option starts with a tilde, it's not allowed to > have > > a value after "="). > > > > What we currently have in PS4 is implementation of the former just for "csp", > > but I think we might as well do it for everything. > > Why is treating "~csp" as an unknown option wrong? How is it different from > "xcsp"? Moreover, the source of truth surely are the semantics of the original > implementation (in Adblock Plus). Sure, if there's no actual use case for "~csp", we could also treat it as wrong. But my understanding is that ABP now accepts "csp=xxx", "~csp" and "~csp=xxx" and I don't really understand the meaning of the last two (the documentation says nothing about them). In any case, python-abp only does syntactic parsing, it does not interpret the meaning of the options. "~csp" looks the same way as a negation of any other option would, so it seems natural to convert it to ('csp', False) in that code in question. "~csp=xxx" seems ill-formed according to the current rules but it seems like ABP supports it (https://hg.adblockplus.org/adblockpluscore/file/tip/test/filterClasses.js#l295), so perhaps python-abp should not break when it sees this syntax and make some sense of it. So perhaps to clear up the confusion, could you explain the meaning of "~csp" and "~csp=xxx" in ABP. Then we can think how it would make sense to handle them in python-abp (proposals are welcome). Cheers, Vasily
On 2018/06/14 09:26:44, Vasily Kuznetsov wrote: > Sure, if there's no actual use case for "~csp", we could also treat it as wrong. > But my understanding is that ABP now accepts "csp=xxx", "~csp" and "~csp=xxx" > and I don't really understand the meaning of the last two (the documentation > says nothing about them). > > In any case, python-abp only does syntactic parsing, it does not interpret the > meaning of the options. "~csp" looks the same way as a negation of any other > option would, so it seems natural to convert it to ('csp', False) in that code > in question. "~csp=xxx" seems ill-formed according to the current rules but it > seems like ABP supports it > (https://hg.adblockplus.org/adblockpluscore/file/tip/test/filterClasses.js#l295), > so perhaps python-abp should not break when it sees this syntax and make some > sense of it. > > So perhaps to clear up the confusion, could you explain the meaning of "~csp" > and "~csp=xxx" in ABP. Then we can think how it would make sense to handle them > in python-abp (proposals are welcome). The tilde (~) isn't part of the filter syntax, but just a random character in the name of the filter option (instead of "~image", one could as well have decided to go with "not-image" instead, back when that feature was added). So if you add a tilde to a filter option that doesn't have a negated equivalent (i.e. domain, sitekey, rewrite and csp), it just results into an unknown filter option, and will be ignored. Therefore "~csp" is equivalent to "xcsp", "any-other-unkown-filter-option" or to not providing that filter option at all.
On 2018/06/14 18:41:59, Sebastian Noack wrote: > On 2018/06/14 09:26:44, Vasily Kuznetsov wrote: > > Sure, if there's no actual use case for "~csp", we could also treat it as > wrong. > > But my understanding is that ABP now accepts "csp=xxx", "~csp" and "~csp=xxx" > > and I don't really understand the meaning of the last two (the documentation > > says nothing about them). > > > > In any case, python-abp only does syntactic parsing, it does not interpret the > > meaning of the options. "~csp" looks the same way as a negation of any other > > option would, so it seems natural to convert it to ('csp', False) in that code > > in question. "~csp=xxx" seems ill-formed according to the current rules but it > > seems like ABP supports it > > > (https://hg.adblockplus.org/adblockpluscore/file/tip/test/filterClasses.js#l295), > > so perhaps python-abp should not break when it sees this syntax and make some > > sense of it. > > > > So perhaps to clear up the confusion, could you explain the meaning of "~csp" > > and "~csp=xxx" in ABP. Then we can think how it would make sense to handle > them > > in python-abp (proposals are welcome). > > The tilde (~) isn't part of the filter syntax, but just a random character in > the name of the filter option (instead of "~image", one could as well have > decided to go with "not-image" instead, back when that feature was added). So if > you add a tilde to a filter option that doesn't have a negated equivalent (i.e. > domain, sitekey, rewrite and csp), it just results into an unknown filter > option, and will be ignored. Therefore "~csp" is equivalent to "xcsp", > "any-other-unkown-filter-option" or to not providing that filter option at all. I see. So basically "csp" cannot be negated and we should just treat it the same way as any other option that can take a value, such as "sitekey". In this case the parsing logic doesn't need to be changed and a test for "~csp=xxx" is not necessary. Sorry for the confusion.
On 2018/06/14 19:10:41, Vasily Kuznetsov wrote: > On 2018/06/14 18:41:59, Sebastian Noack wrote: > > On 2018/06/14 09:26:44, Vasily Kuznetsov wrote: > > > Sure, if there's no actual use case for "~csp", we could also treat it as > > wrong. > > > But my understanding is that ABP now accepts "csp=xxx", "~csp" and > "~csp=xxx" > > > and I don't really understand the meaning of the last two (the documentation > > > says nothing about them). > > > > > > In any case, python-abp only does syntactic parsing, it does not interpret > the > > > meaning of the options. "~csp" looks the same way as a negation of any other > > > option would, so it seems natural to convert it to ('csp', False) in that > code > > > in question. "~csp=xxx" seems ill-formed according to the current rules but > it > > > seems like ABP supports it > > > > > > (https://hg.adblockplus.org/adblockpluscore/file/tip/test/filterClasses.js#l295), > > > so perhaps python-abp should not break when it sees this syntax and make > some > > > sense of it. > > > > > > So perhaps to clear up the confusion, could you explain the meaning of > "~csp" > > > and "~csp=xxx" in ABP. Then we can think how it would make sense to handle > > them > > > in python-abp (proposals are welcome). > > > > The tilde (~) isn't part of the filter syntax, but just a random character in > > the name of the filter option (instead of "~image", one could as well have > > decided to go with "not-image" instead, back when that feature was added). So > if > > you add a tilde to a filter option that doesn't have a negated equivalent > (i.e. > > domain, sitekey, rewrite and csp), it just results into an unknown filter > > option, and will be ignored. Therefore "~csp" is equivalent to "xcsp", > > "any-other-unkown-filter-option" or to not providing that filter option at > all. > > I see. So basically "csp" cannot be negated and we should just treat it the same > way as any other option that can take a value, such as "sitekey". In this case > the parsing logic doesn't need to be changed and a test for "~csp=xxx" is not > necessary. Sorry for the confusion. Per this discussion, I've removed the handling and testing for ~csp.
LGTM |