|
|
Created:
June 6, 2018, 2:56 p.m. by Manish Jethani Modified:
June 19, 2018, 9:13 a.m. Base URL:
https://hg.adblockplus.org/adblockpluscore/ Visibility:
Public. |
Patch Set 1 #
Total comments: 5
Patch Set 2 : Rebase #Patch Set 3 : Update tests to check for casing #
Total comments: 7
Patch Set 4 : Rebase #Patch Set 5 : Update logic to minimize code #
Total comments: 14
Patch Set 6 : Use Map version of RegExpFilter.typeMap #Patch Set 7 : Disallow missing values in domain, sitekey, and rewrite #
Total comments: 5
MessagesTotal messages: 21
Patch Set 1 Domains are lowercase by default, both in the filter list and as reported by the browser extension. It doesn't make sense to keep them as uppercase in the filter class, because it creates redundant string. We can keep them lowercase. This saves ~2 MB with the default subscriptions. I can file a Trac issue if you think this is a good idea. https://codereview.adblockplus.org/29800595/diff/29800596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29800596/lib/filterClasses.j... lib/filterClasses.js:776: let type = (option[0] == "~" ? option.substr(1) : option) This part may seem unrelated but the goal is to avoid uppercasing and then lowercasing again.
https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.... test/filterClasses.js:471: "b$la$sitekey=foo,domain=domain.com|foo.com,csp=c s p" it seems the input should be rather something like b$la$sitekey=foo,domain=domain.com|FOO.cOm,csp=c s p
Patch Set 2: Rebase https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.... test/filterClasses.js:471: "b$la$sitekey=foo,domain=domain.com|foo.com,csp=c s p" On 2018/06/07 09:25:17, sergei wrote: > it seems the input should be rather something like > b$la$sitekey=foo,domain=domain.com|FOO.cOm,csp=c s p The normalization does not affect the case of the filter text unfortunately. Perhaps it should? Anyway, if we decide to do that then it should be a separate Trac issue.
https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.... test/filterClasses.js:471: "b$la$sitekey=foo,domain=domain.com|foo.com,csp=c s p" On 2018/06/07 10:22:14, Manish Jethani wrote: > On 2018/06/07 09:25:17, sergei wrote: > > it seems the input should be rather something like > > b$la$sitekey=foo,domain=domain.com|FOO.cOm,csp=c s p > > The normalization does not affect the case of the filter text unfortunately. > Perhaps it should? Anyway, if we decide to do that then it should be a separate > Trac issue. Right, sorry, I missed the beginning of the test, I thought the test checks the values of the filter fields. BTW, the domains with capital letters should be in the original filter texts in tests above.
Patch Set 3: Update tests to check for casing https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29800596/test/filterClasses.... test/filterClasses.js:471: "b$la$sitekey=foo,domain=domain.com|foo.com,csp=c s p" On 2018/06/07 10:44:48, sergei wrote: > On 2018/06/07 10:22:14, Manish Jethani wrote: > > On 2018/06/07 09:25:17, sergei wrote: > > > it seems the input should be rather something like > > > b$la$sitekey=foo,domain=domain.com|FOO.cOm,csp=c s p > > > > The normalization does not affect the case of the filter text unfortunately. > > Perhaps it should? Anyway, if we decide to do that then it should be a > separate > > Trac issue. > > Right, sorry, I missed the beginning of the test, I thought the test checks the > values of the filter fields. BTW, the domains with capital letters should be in > the original filter texts in tests above. Ah, makes sense, sorry I missed your point earlier. Done.
LGTM, @Dave could you please take a look?
Overall seems like a cool idea, I like the implementation too the new logic is easier to understand already. I have a few ideas how it can be improved though. https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.js File lib/filterClasses.js (left): https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... lib/filterClasses.js:772: contentType &= ~RegExpFilter.typeMap[option.substr(1)]; You know reading your new logic I was trying to figure out how the old logic actually worked. I think your version is an improvement for sure, it sucked how we looked up the content type in all these different places. https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... lib/filterClasses.js:758: let type = (option[0] == "~" ? option.substr(1) : option) Not sure it matters especially but it seems kinda wasteful that we check `option[0] == "~"` twice. If we assigned it to a variable it would also be an opportunity to give it a useful name like `inverseOption` or similar which would help document the code a little bit. https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... lib/filterClasses.js:775: if (type == "CSP" && value) I found the variable names `type` vs `contentType` kinda confusing initially. I understand the logic now that I'm part way through a second coffee but I wonder if we could just get rid of the `type` variable? If I got it right we only use it for looking up the content type and to check if it's "CSP" here. How about we avoid saving the type string, instead saving the content type right away and then here checking if the content type is RegExpFilter.typeMap.CSP? It would avoid a string comparison too. https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... lib/filterClasses.js:781: else if (option == "~match-case") If we added the `inverseOption` variable we could simplify this logic (same below): else if (type == "MATCH_CASE") matchCase = !inverseOption; (The down side is that contradicts what I said above about getting rid of the `type` variable.)
On 2018/06/07 13:16:03, kzar wrote: > Overall seems like a cool idea, I like the implementation too the new logic is > easier to understand already. I have a few ideas how it can be improved though. > > https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.js > File lib/filterClasses.js (left): > > https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... > lib/filterClasses.js:772: contentType &= > ~RegExpFilter.typeMap[option.substr(1)]; > You know reading your new logic I was trying to figure out how the old logic > actually worked. I think your version is an improvement for sure, it sucked how > we looked up the content type in all these different places. > > https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.js > File lib/filterClasses.js (right): > > https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... > lib/filterClasses.js:758: let type = (option[0] == "~" ? option.substr(1) : > option) > Not sure it matters especially but it seems kinda wasteful that we check > `option[0] == "~"` twice. If we assigned it to a variable it would also be an > opportunity to give it a useful name like `inverseOption` or similar which would > help document the code a little bit. > > https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... > lib/filterClasses.js:775: if (type == "CSP" && value) > I found the variable names `type` vs `contentType` kinda confusing initially. I > understand the logic now that I'm part way through a second coffee but I wonder > if we could just get rid of the `type` variable? If I got it right we only use > it for looking up the content type and to check if it's "CSP" here. > > How about we avoid saving the type string, instead saving the content type right > away and then here checking if the content type is RegExpFilter.typeMap.CSP? It > would avoid a string comparison too. > > https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... > lib/filterClasses.js:781: else if (option == "~match-case") > If we added the `inverseOption` variable we could simplify this logic (same > below): > > else if (type == "MATCH_CASE") > matchCase = !inverseOption; > > (The down side is that contradicts what I said above about getting rid of the > `type` variable.) I also noticed it but left no comments. I find the proposed small changes already as a good step and pretty self contained, therefore can we address the rest in other commits?
On 2018/06/07 13:20:49, sergei wrote: > On 2018/06/07 13:16:03, kzar wrote: > > [...] > > > > else if (type == "MATCH_CASE") > > matchCase = !inverseOption; > > > > (The down side is that contradicts what I said above about getting rid of the > > `type` variable.) > > I also noticed it but left no comments. I find the proposed small changes > already as a good step and pretty self contained, therefore can we address the > rest in other commits? Well, just to add, I noticed it too, and was intending to make these kinds of improvements soon after while keeping this change minimal and specific only to the lowercasing of domain names. I tend to agree with Sergei, but I'm also OK with making more improvements as part of this change. Whatever works for you, Dave.
On 2018/06/07 14:06:32, Manish Jethani wrote: > On 2018/06/07 13:20:49, sergei wrote: > > On 2018/06/07 13:16:03, kzar wrote: > > > [...] > > > > > > else if (type == "MATCH_CASE") > > > matchCase = !inverseOption; > > > > > > (The down side is that contradicts what I said above about getting rid of > the > > > `type` variable.) > > > > I also noticed it but left no comments. I find the proposed small changes > > already as a good step and pretty self contained, therefore can we address the > > rest in other commits? > > Well, just to add, I noticed it too, and was intending to make these kinds of > improvements soon after while keeping this change minimal and specific only to > the lowercasing of domain names. I tend to agree with Sergei, but I'm also OK > with making more improvements as part of this change. Whatever works for you, > Dave. I'd prefer to do it here FWIW.
Patch Set 4: Rebase Patch Set 5: Update logic to minimize code The changes are getting outside of the scope of this patch, I'm in two minds whether to split out this last part into a separate patch. Anyway, take a look, I've incorporated the suggestions and made some improvements of my own. https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... lib/filterClasses.js:758: let type = (option[0] == "~" ? option.substr(1) : option) On 2018/06/07 13:16:02, kzar wrote: > Not sure it matters especially but it seems kinda wasteful that we check > `option[0] == "~"` twice. If we assigned it to a variable it would also be an > opportunity to give it a useful name like `inverseOption` or similar which would > help document the code a little bit. Done. https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... lib/filterClasses.js:775: if (type == "CSP" && value) On 2018/06/07 13:16:02, kzar wrote: > I found the variable names `type` vs `contentType` kinda confusing initially. I > understand the logic now that I'm part way through a second coffee but I wonder > if we could just get rid of the `type` variable? If I got it right we only use > it for looking up the content type and to check if it's "CSP" here. > > How about we avoid saving the type string, instead saving the content type right > away and then here checking if the content type is RegExpFilter.typeMap.CSP? It > would avoid a string comparison too. Done. https://codereview.adblockplus.org/29800595/diff/29801605/lib/filterClasses.j... lib/filterClasses.js:781: else if (option == "~match-case") On 2018/06/07 13:16:02, kzar wrote: > If we added the `inverseOption` variable we could simplify this logic (same > below): > > else if (type == "MATCH_CASE") > matchCase = !inverseOption; Done. > (The down side is that contradicts what I said above about getting rid of the > `type` variable.) I've managed to do both, good suggestions! https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:759: let inverse = option[0] == "~"; I'm just calling it "inverse" because "option" is implied (this is the options loop). https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:763: let type = RegExpFilter.typeMap[option.replace(/-/, "_").toUpperCase()]; Lookup happens only once in the RegExpFilter.typeMap object. https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:776: if (type == RegExpFilter.typeMap.CSP && value) Integer comparison. https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:782: switch (option.toLowerCase()) Only lowercase if we get this far. switch is cleaner than if-else in my opinion, let me know if you disagree. https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:785: matchCase = !inverse; !inverse https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:788: if (value) Note: This has a good side effect. Previously foo$domain would throw an error, I don't see why it should. It should just apply to any domain in that case. I mean the option should be ignored (it's not an "unknown option", it's a known option with no value). https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:802: if (value) Note that this also simplifies how we can accept the blank value for the rewrite option in the other patch.
Patch Set 6: Use Map version of RegExpFilter.typeMap Here's another idea: We can avoid some more string transformations (and get faster lookups) by using a Map object. We populate it once, it's cheap, then we can look up directly each time. The only transformation needed is String.toLowerCase, which in most cases will yield the same string anyway.
So I think Patch Set 5 and 6 should go into their own patch, perhaps with a Trac issue.
On 2018/06/08 05:31:15, Manish Jethani wrote: > Patch Set 6: Use Map version of RegExpFilter.typeMap > > Here's another idea: We can avoid some more string transformations (and get > faster lookups) by using a Map object. We populate it once, it's cheap, then we > can look up directly each time. The only transformation needed is > String.toLowerCase, which in most cases will yield the same string anyway. To test my theory I wrote this Node.js program: (function() { let obj = { OTHER: 1, SCRIPT: 2, IMAGE: 4, STYLESHEET: 8, OBJECT: 16, SUBDOCUMENT: 32, DOCUMENT: 64, WEBSOCKET: 128, WEBRTC: 256, CSP: 512, XBL: 1, PING: 1024, XMLHTTPREQUEST: 2048, OBJECT_SUBREQUEST: 4096, DTD: 1, MEDIA: 16384, FONT: 32768, BACKGROUND: 4, // Backwards compat, same as IMAGE POPUP: 0x10000000, GENERICBLOCK: 0x20000000, ELEMHIDE: 0x40000000, GENERICHIDE: 0x80000000 }; let map = new Map( Object.keys(obj).map( key => [key.replace("_", "-").toLowerCase(), obj[key]] ) ); let s = Date.now(); let n = 0; for (let i = 0; i < 100000000; i++) if (obj[process.argv[2].replace("-", "_").toUpperCase()]) //if (map.get(process.argv[2].toLowerCase())) n++; console.log(Date.now() - s, n); })(); Run like this: node test.js object-subrequest node test.js domain Uncomment the line with the map lookup, comment out the previous line, and run again. You'll see that lookups in the map, where we don't need to replace "-" with "_", are significantly faster (5x or so here). Note that where the key is static (e.g. RegExpFilter.typeMap.CSP) lookups are faster in an object.
Patch Set 5 LGTM, though I'd like to get Sergei's opinion on the slight change of behaviour. In fact I think it's a huge improvement, nice work. Definitely agree that your Map idea belongs in a separate issue. Would you mind filing one so we can all discuss? (I think Sebastian would be interested in that idea too.) P.S. I'm on vacation today, but took a quick look at this to avoid blocking this. If you guys want to go ahead with a patch set in my absence just go for it. https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:759: let inverse = option[0] == "~"; On 2018/06/08 05:00:41, Manish Jethani wrote: > I'm just calling it "inverse" because "option" is implied (this is the options > loop). Acknowledged. https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:763: let type = RegExpFilter.typeMap[option.replace(/-/, "_").toUpperCase()]; On 2018/06/08 05:00:41, Manish Jethani wrote: > Lookup happens only once in the RegExpFilter.typeMap object. Nice. https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:785: matchCase = !inverse; On 2018/06/08 05:00:41, Manish Jethani wrote: > !inverse (Heh, it does make you wonder if storing !inverse as a variable would be better, but I can't think of a good name.) https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:788: if (value) On 2018/06/08 05:00:41, Manish Jethani wrote: > Note: This has a good side effect. Previously foo$domain would throw an error, I > don't see why it should. It should just apply to any domain in that case. I mean > the option should be ignored (it's not an "unknown option", it's a known option > with no value). Seems reasonable to me, but I wonder if this was intentional. Do you have an opinion Segei? https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:802: if (value) On 2018/06/08 05:00:41, Manish Jethani wrote: > Note that this also simplifies how we can accept the blank value for the rewrite > option in the other patch. Acknowledged.
https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:788: if (value) On 2018/06/08 09:30:32, kzar wrote: > On 2018/06/08 05:00:41, Manish Jethani wrote: > > Note: This has a good side effect. Previously foo$domain would throw an error, > I > > don't see why it should. It should just apply to any domain in that case. I > mean > > the option should be ignored (it's not an "unknown option", it's a known > option > > with no value). > > Seems reasonable to me, but I wonder if this was intentional. Do you have an > opinion Segei? Formally, if the option is present and there are no values then the filter is domain specific with no domains and simply put it should be never applied. BTW, previously the code returned an instance of InvalidFilter, not thrown an exception. If it's a special case then it deserves a test IMO. I would like to prefer to fail earlier in terms of rather mark the filter as invalid, however if it conflicts with the idea to allow empty options, then it's fine for me but could you please add a test for it?
https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29801686/lib/filterClasses.j... lib/filterClasses.js:788: if (value) On 2018/06/08 13:28:11, sergei wrote: > On 2018/06/08 09:30:32, kzar wrote: > > On 2018/06/08 05:00:41, Manish Jethani wrote: > > > Note: This has a good side effect. Previously foo$domain would throw an > error, > > I > > > don't see why it should. It should just apply to any domain in that case. I > > mean > > > the option should be ignored (it's not an "unknown option", it's a known > > option > > > with no value). > > > > Seems reasonable to me, but I wonder if this was intentional. Do you have an > > opinion Segei? > > Formally, if the option is present and there are no values then the filter is > domain specific with no domains and simply put it should be never applied. BTW, > previously the code returned an instance of InvalidFilter, not thrown an > exception. For known options that must have a value but none is given, what do you think about returning an invalid filter with a different reason like "filter_missing_value" or "filter_invalid_value"? Previously the reason was "filter_unknown_option" was not exactly correct. Anyway, let's try to stay compatible here and not change the behavior, so I'll try to keep it the same for now. Let me update the patch.
Patch Set 7: Disallow missing values in domain, sitekey, and rewrite https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.j... lib/filterClasses.js:799: if (!value) There's a bit of code repetition here but I'm OK with it. https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.j... lib/filterClasses.js:804: if (!value) Note that this will get updated for the rewrite option in the other patch.
https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.j... lib/filterClasses.js:776: if (type == RegExpFilter.typeMap.CSP && value) By the way in the case of csp we quietly ignore the option if it has no value, so we are not being consistent. Anyway we should address these things separately, not as part of this patch.
LGTM https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.j... lib/filterClasses.js:776: if (type == RegExpFilter.typeMap.CSP && value) On 2018/06/08 14:21:14, Manish Jethani wrote: > By the way in the case of csp we quietly ignore the option if it has no value, > so we are not being consistent. Anyway we should address these things > separately, not as part of this patch. Yea, agreed. By the way Rosie is reimplementing this stuff in Python ATM, so if you change the behaviour ping her in IRC. https://codereview.adblockplus.org/29800595/diff/29802583/lib/filterClasses.j... lib/filterClasses.js:799: if (!value) On 2018/06/08 14:19:42, Manish Jethani wrote: > There's a bit of code repetition here but I'm OK with it. Yea, I think it's fine. |