|
|
Created:
June 24, 2017, 2:48 p.m. by Manish Jethani Modified:
July 12, 2017, 9:25 a.m. Reviewers:
kzar CC:
Sebastian Noack Base URL:
https://hg.adblockplus.org/abp2blocklist Visibility:
Public. |
DescriptionFor the following filter list:
##.sponsored-container-bottom
@@||walmart.com^$elemhide
Generate the following rule set:
[
{
"trigger": {
"url-filter": "^https?://",
"url-filter-is-case-sensitive": true,
"unless-domain": [
"*walmart.com"
]
},
"action": {
"type": "css-display-none",
"selector": ".sponsored-container-bottom"
}
}
]
Patch Set 1 #
Total comments: 20
Patch Set 2 : Address comments to Patch Set 1 #
Total comments: 3
Patch Set 3 : Generate unless-domain value outside while loop #
Total comments: 3
Patch Set 4 : Rebase #
Total comments: 1
MessagesTotal messages: 13
This changes the implementation of $elemhide and $generichide so that we whitelist the domain where possible by including it in the unless-domain list for each filter. This fixes AA on walmart.com and more than a thousand domains. https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js#ne... abp2blocklist.js:37: // If the rule set is too huge, Node.js throws This had to be done because of this change. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.js File lib/abp2blocklist.js (left): https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:517: // Right after the generic element hiding filters, add the exceptions that We could continue generating individual rules for these as well, but since it's not working anyway we might as well drop these for now. When the WebKit bug is fixed, we should go back to this. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:148: text.substring(hostnameStart, endingChar ? i : i + 1).toLowerCase() punycode.toASCII doesn't lower-case the string, we have to do it ourselves. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:417: exceptionDomains = Array.from(new Set(exceptionDomains)); Ensure no duplicates.
https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js#ne... abp2blocklist.js:37: // If the rule set is too huge, Node.js throws On 2017/06/24 14:54:15, Manish Jethani wrote: > This had to be done because of this change. So if the exception is thrown then only half the JSON will be printed right? I kinda wonder how that's useful if so. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.js File lib/abp2blocklist.js (left): https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:517: // Right after the generic element hiding filters, add the exceptions that On 2017/06/24 14:54:15, Manish Jethani wrote: > We could continue generating individual rules for these as well, but since it's > not working anyway we might as well drop these for now. When the WebKit bug is > fixed, we should go back to this. Maybe add a comment explaining what needs to change in the future once the bug's fixed? https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:90: return domains; Why not make domains a Set instead of an Array here? Then we wouldn't have to keep converting it later on. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:435: rule.trigger["unless-domain"] = exceptionDomains.map(name => "*" + name); Maybe we should do this work outside of the while loop? https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:536: // (WebKit bug #167423), but as a bonus it also reduces the number of Mind giving the full URL for the WebKit bug? https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:542: // terms of the amount of memory used. This can cause Node.js to throw Have you tested that rule generation still works OK with Adblock Plus for Safari when this change is included? It kinda sucks that we'll be using a bunch more memory there.
(Moving Sebastian to Cc on this one.)
Patch Set 2: Address comments to Patch Set 1 Some comments inline. https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js#ne... abp2blocklist.js:37: // If the rule set is too huge, Node.js throws On 2017/07/07 11:40:13, kzar wrote: > On 2017/06/24 14:54:15, Manish Jethani wrote: > > This had to be done because of this change. > > So if the exception is thrown then only half the JSON will be printed right? I > kinda wonder how that's useful if so. If the exception is thrown the program crashes. This happens if the rule set is too huge. Since we're printing one rule at a time, this is not an issue. The exception is thrown in JSON.stringify, I've made it clear in the comment now. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:90: return domains; On 2017/07/07 11:40:13, kzar wrote: > Why not make domains a Set instead of an Array here? Then we wouldn't have to > keep converting it later on. That's a good point, it does seem to make a huge difference. Done. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:435: rule.trigger["unless-domain"] = exceptionDomains.map(name => "*" + name); On 2017/07/07 11:40:13, kzar wrote: > Maybe we should do this work outside of the while loop? We have to make a copy of the array as a rule, because later when we implement rule merging we want to be able to modify one rule object without it affecting other rule objects. If this turns out not to be a concern in practice for this particular case, then we can optimize here. I actually tried moving it outside the while loop but then calling [].slice on it for each rule, it did not make any difference in performance. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:536: // (WebKit bug #167423), but as a bonus it also reduces the number of On 2017/07/07 11:40:13, kzar wrote: > Mind giving the full URL for the WebKit bug? Done. By the way, there are new comments on that bug report. Maybe we have misunderstood how css-display-none and ignore-previous-rules are supposed to work. I'm going to post an update there that definitely shows that this is broken in WebKit. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:542: // terms of the amount of memory used. This can cause Node.js to throw On 2017/07/07 11:40:13, kzar wrote: > Have you tested that rule generation still works OK with Adblock Plus for Safari > when this change is included? It kinda sucks that we'll be using a bunch more > memory there. I tested it there and it works without problems.
https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/abp2blocklist.js#ne... abp2blocklist.js:37: // If the rule set is too huge, Node.js throws On 2017/07/08 05:33:59, Manish Jethani wrote: > On 2017/07/07 11:40:13, kzar wrote: > > On 2017/06/24 14:54:15, Manish Jethani wrote: > > > This had to be done because of this change. > > > > So if the exception is thrown then only half the JSON will be printed right? I > > kinda wonder how that's useful if so. > > If the exception is thrown the program crashes. This happens if the rule set is > too huge. Since we're printing one rule at a time, this is not an issue. > > The exception is thrown in JSON.stringify, I've made it clear in the comment > now. Acknowledged. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:435: rule.trigger["unless-domain"] = exceptionDomains.map(name => "*" + name); On 2017/07/08 05:33:59, Manish Jethani wrote: > On 2017/07/07 11:40:13, kzar wrote: > > Maybe we should do this work outside of the while loop? > > We have to make a copy of the array as a rule, because later when we implement > rule merging we want to be able to modify one rule object without it affecting > other rule objects. If this turns out not to be a concern in practice for this > particular case, then we can optimize here. > > I actually tried moving it outside the while loop but then calling [].slice on > it for each rule, it did not make any difference in performance. I'd rather you did the work outside the loop here for now, then moved it if required for the merging patchset. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:536: // (WebKit bug #167423), but as a bonus it also reduces the number of On 2017/07/08 05:33:59, Manish Jethani wrote: > On 2017/07/07 11:40:13, kzar wrote: > > Mind giving the full URL for the WebKit bug? > > Done. > > By the way, there are new comments on that bug report. Maybe we have > misunderstood how css-display-none and ignore-previous-rules are supposed to > work. I'm going to post an update there that definitely shows that this is > broken in WebKit. Acknowledged. https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:542: // terms of the amount of memory used. This can cause Node.js to throw On 2017/07/08 05:33:59, Manish Jethani wrote: > On 2017/07/07 11:40:13, kzar wrote: > > Have you tested that rule generation still works OK with Adblock Plus for > Safari > > when this change is included? It kinda sucks that we'll be using a bunch more > > memory there. > > I tested it there and it works without problems. Acknowledged. https://codereview.adblockplus.org/29473555/diff/29483555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29483555/lib/abp2blocklist.j... lib/abp2blocklist.js:556: genericSelectorExceptionDomains.add(name); I wonder if it would be better to pass two Sets of domains to addCSSRules rather than combining them here. I figure we're iterating through those domains twice otherwise.
Patch Set 3: Generate unless-domain value outside while loop https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29473556/lib/abp2blocklist.j... lib/abp2blocklist.js:435: rule.trigger["unless-domain"] = exceptionDomains.map(name => "*" + name); On 2017/07/10 12:33:07, kzar wrote: > On 2017/07/08 05:33:59, Manish Jethani wrote: > > On 2017/07/07 11:40:13, kzar wrote: > > > Maybe we should do this work outside of the while loop? > > > > We have to make a copy of the array as a rule, because later when we implement > > rule merging we want to be able to modify one rule object without it affecting > > other rule objects. If this turns out not to be a concern in practice for this > > particular case, then we can optimize here. > > > > I actually tried moving it outside the while loop but then calling [].slice on > > it for each rule, it did not make any difference in performance. > > I'd rather you did the work outside the loop here for now, then moved it if > required for the merging patchset. Done. https://codereview.adblockplus.org/29473555/diff/29483555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29483555/lib/abp2blocklist.j... lib/abp2blocklist.js:556: genericSelectorExceptionDomains.add(name); On 2017/07/10 12:33:08, kzar wrote: > I wonder if it would be better to pass two Sets of domains to addCSSRules rather > than combining them here. I figure we're iterating through those domains twice > otherwise. We'd still have to combine them into one set if we want to filter out duplicates. Also, this code runs only once per rule set generation, it really doesn't matter. Alternatively we could use the has method to check if the value is already in the other set while iterating.
https://codereview.adblockplus.org/29473555/diff/29483555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29483555/lib/abp2blocklist.j... lib/abp2blocklist.js:556: genericSelectorExceptionDomains.add(name); On 2017/07/11 11:19:18, Manish Jethani wrote: > On 2017/07/10 12:33:08, kzar wrote: > > I wonder if it would be better to pass two Sets of domains to addCSSRules > rather > > than combining them here. I figure we're iterating through those domains twice > > otherwise. > > We'd still have to combine them into one set if we want to filter out > duplicates. Also, this code runs only once per rule set generation, it really > doesn't matter. > > Alternatively we could use the has method to check if the value is already in > the other set while iterating. Fair enough. https://codereview.adblockplus.org/29473555/diff/29486562/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29486562/lib/abp2blocklist.j... lib/abp2blocklist.js:417: let unlessDomain = exceptionDomains.size > 0 ? [] : null; Nit: This seems like overkill since `[]` is falsey anyway.
https://codereview.adblockplus.org/29473555/diff/29486562/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29486562/lib/abp2blocklist.j... lib/abp2blocklist.js:417: let unlessDomain = exceptionDomains.size > 0 ? [] : null; On 2017/07/11 12:20:03, kzar wrote: > Nit: This seems like overkill since `[]` is falsey anyway. [] evaluates to true. If you mean checking the length property, we could do that, but then I'm just wondering why since it's inside the loop.
LGTM, go ahead and push when you're ready to. https://codereview.adblockplus.org/29473555/diff/29486562/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29486562/lib/abp2blocklist.j... lib/abp2blocklist.js:417: let unlessDomain = exceptionDomains.size > 0 ? [] : null; On 2017/07/11 16:28:39, Manish Jethani wrote: > On 2017/07/11 12:20:03, kzar wrote: > > Nit: This seems like overkill since `[]` is falsey anyway. > > [] evaluates to true. If you mean checking the length property, we could do > that, but then I'm just wondering why since it's inside the loop. Whoops, you're right.
Patch Set 4: Rebase Rebased on the current master.
https://codereview.adblockplus.org/29473555/diff/29486569/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29473555/diff/29486569/lib/abp2blocklist.j... lib/abp2blocklist.js:157: characters.slice(hostnameStart, endingChar ? i : i + 1).join("") This was the only conflict during rebase, since text (String) changed to characters (Array) in another commit.
LGTM |