|
|
Created:
May 1, 2017, 7:37 p.m. by Manish Jethani Modified:
July 31, 2017, 8:45 a.m. CC:
Felix Dahlke Base URL:
https://hg.adblockplus.org/abp2blocklist Visibility:
Public. |
DescriptionIssue 3673 - Merge closely matching rules
SCOPE
Our goal is to merge two or more rules into a single rule and to do this for as many rules as is feasible given limited developer time and limitations of runtime performance.
We do not aim to merge all possible rules that can be merged. We may want to do this in the future [1], but it is outside the scope of this issue.
HIGH-LEVEL ALGORITHM
What follows is a high-level description of the algorithm.
Two objects are said to be mergeable by property X if they are otherwise identical except for the value of property X. For example, {X: 1, Y: 1} and {X: 2, Y: 1} can be merged into {X: [1, 2], Y: 1}, therefore they are "mergeable by X". {X: 1, Y: 1} and {X: 1, Y: 2} on the other hand are not mergeable by X, but they are mergeable by Y. {X: 1, Y: 1} and {X: 2, Y: 2} are neither mergeable by X nor mergeable by Y. They may or may not be "mergeable by X and Y", depending on the semantics of X and Y [2], but we do not handle this case.
The rule set is divided into logical groups: CSS rules, CSS exception rules, blocking rules, and blocking exception rules. While the order of the logical groups is significant, the order of the rules within each logical group is not. We can only safely merge rules that belong to the same logical group.
We start by dividing each logical group further into groups of rules mergeable by URL filter (rule.trigger["url-filter"]). We then try to merge as many rules within each group as we "feasibly" can using a simple algorithm, which is described below.
Next we take the output of the above and divide the rules again into groups of rules mergeable by resource type (rule.trigger["resource-type"]). We then merge all the rules within each group into a single rule. Since the resource type is a list, merging a group of two or more rules is as simple as making a union of all the resource types in the group.
Next we take the output of the above and do the same thing for the "if-domain" property (rule.trigger["if-domain"]), the value of which is also a mergeable list.
The output of the above is the merged logical group.
The final rule set is a concatenation of all the merged logical groups.
MERGING BY URL FILTER
The algorithm for merging by URL filter is as follows.
For each pair combination of rules in the group [3], we check if the URL filters are a "close match". Two URL filters are a close match either if they differ only in one character (i.e. a Levenshtein distance of 1) or if it's possible to convert one into the other by deleting or inserting one and only one contiguous range of characters from one into the other. For example, "ad" and "ads", "adv" and "ads", and "adv" and "advertisement" are all close matches.
Next we sort the group by the number of close matches, such that a rule with N close matches appears before a rule with N - 1 close matches in the group. We then iterate over the group and try to merge each rule with as many of its close matches as we can. For example, the rules with URL filters "ad", "ads", and "adv" are all merged into one with the URL filter "ad[sv]?".
Note that the URL filter "adv" is a close match with "advertisement" but also with "ad" and "ads". In this case we merge it into "ad[vs]?" and not "adv(ertisement)?", since the former combines more rules into one.
LIMITATIONS & AREAS FOR IMPROVEMENT
We only merge with one set of parentheses. For example, "ad" and "advertisement" will be merged into "ad(vertisement)?", but "ad", "advert", and "advertisement" will not be merged into "ad(vert(isement)?)?" [4]. Further, we never mix parentheses with other kinds of merging. For example, "ads", "adv", and "adsserver", and "advserver" will not be merged into "ad[sv](server)?" [4].
Also, since URL filters are regular expressions to begin with, if the delta between two URL filters contains a regular expression special character like "\", "+", "?", and so on, the two are not considered to be a close match [4].
RESULTS
Below are the results of running the merge function on the various subscription options available in Adblock Plus for Safari on macOS.
* EasyList+AA
Rule merging took 918ms
Before: 49,261 rules
After: 43,617 rules
* EasyList Germany+EasyList+AA
Rule merging took 960ms
Before: 55,646 rules
After: 49,674 rules
* EasyList China+EasyList+AA
Rule merging took 968ms
Before: 57,725 rules
After: 51,912 rules [5]
* ABPindo+EasyList+AA
Rule merging took 861ms
Before: 50,250 rules
After: 44,561 rules
* Bulgarian list+EasyList+AA
Rule merging took 877ms
Before: 50,446 rules
After: 44,777 rules
* EasyList Czech and Slovak+EasyList+AA
Rule merging took 864ms
Before: 50,188 rules
After: 44,520 rules
* EasyList Dutch+EasyList+AA
Rule merging took 882ms
Before: 50,399 rules
After: 44,710 rules
* EasyList Hebrew+EasyList+AA
Rule merging took 860ms
Before: 50,069 rules
After: 44,342 rules
* EasyList Italy+EasyList+AA
Rule merging took 934ms
Before: 52,148 rules
After: 46,404 rules
* EasyList Lithuania+EasyList+AA
Rule merging took 880ms
Before: 50,262 rules
After: 44,599 rules
* Latvian List+EasyList+AA
Rule merging took 845ms
Before: 49,596 rules
After: 43,937 rules
* Liste AR+Liste FR+EasyList+AA
Rule merging took 977ms
Before: 60,398 rules
After: 53,769 rules [5]
* Liste FR+EasyList+AA
Rule merging took 955ms
Before: 58,875 rules
After: 52,352 rules [5]
* ROList+EasyList+AA
Rule merging took 858ms
Before: 49,556 rules
After: 43,897 rules
* RuAdList+EasyList+AA
Rule merging took 1,070ms
Before: 62,919 rules
After: 56,663 rules [5]
Footnotes:
1: Owing to the relative simplicity of the algorithm, there may still be mergeable rules that remain unmerged. A simple way to address this would be to run the algorithm recursively until it is no longer able to merge any more rules (essentially brute force). We could also come up with something cleverer. Based on tests with real data, it appears not worth trying.
2: For content blocker rules, there are no two properties that can be merged together. If "if-domain" and "unless-domain" were allowed to be specified together, two rules could be merged by both properties, but this is not the case.
3: This is necessarily O(N^2). We can reduce it to O(N) by specifying a "heuristic range" within which to look for close matches. In fact, we have tried with a heuristic range of 10, and it appears to yield good results.
4: This is an area for improvement.
5: This is still above the 50,000 rules limit.
Patch Set 1 #Patch Set 2 : Ignore special characters #
Total comments: 1
Patch Set 3 : Look for matches only within a certain range #Patch Set 4 : Fix bugs and add unit tests #Patch Set 5 : Add advanced merge support #
Total comments: 27
Patch Set 6 : Improved matching algorithm #Patch Set 7 : Make it work in Safari #Patch Set 8 : Simplify and optimize #Patch Set 9 : Simplify closeMatch function #Patch Set 10 : Merge rules by resource-type and if-domain #Patch Set 11 : Clean up #Patch Set 12 : Filter out redundant rules before merging #
Total comments: 27
Patch Set 13 : Move merge options to constructor and update comments #
Total comments: 6
Patch Set 14 : Use for..of instead of forEach where possible #
Total comments: 2
Patch Set 15 : Change options object into a single boolean option #Patch Set 16 : Merge rules in groups #
Total comments: 7
Patch Set 17 : Make generateRules asynchronous #Patch Set 18 : Make more code asynchronous #Patch Set 19 : Implement merge options "all", "auto", and "none" #Patch Set 20 : Remove redundant async #
Total comments: 6
Patch Set 21 : Split variable declaration statement #
Total comments: 3
Patch Set 22 : Rebase with minor changes #
Total comments: 4
Patch Set 23 : Map async callees asynchronously #Patch Set 24 : Rebase #
MessagesTotal messages: 67
Patch Set 1
Wow, I'm impressed you came up with something so fast. Could you add some unit tests for the new behaviour though?
Also could you run the script on EasyList before and after your changes? I'd be interested to know if there's much of a slow down and also how many rules it manages to save. Another thing to check, have you made sure the blocking list generated still works with Safari after your changes? I can't remember if you use an Apple laptop for development, but if you do you can create an empty extension in Safari and then add the blocking list to it from the extension builder interface. We were hoping to create a tool for testing blocking lists that worked on Linux (and in the future from CI) but we never got around to it. Actually if you want something to do that might be quite interesting to work on as well. There's not an issue for that yet but I did keep a few notes, so if you're interested I can forward them.
On 2017/05/02 09:24:46, kzar wrote: > Also could you run the script on EasyList before and after your changes? I'd be > interested to know if there's much of a slow down and also how many rules it > manages to save. Before: * Execution time: ~15s * Output size: 38,955 rules After: * Execution time: ~27s * Output size: 37,586 rules This change increases the execution time by ~80% and reduces the size of the generate rule set by ~3.4% (1,369 rules). We could reduce the size further by sacrificing more execution time (e.g. we ignore css-display-none rules entirely for now). Also, the added execution is easily offset by using a more efficient implementation of tldjs.getDomain. I was able to make the pre-change script run ~85% faster by swapping in my own implementation. We could also just use the implementation in adblockpluschrome. I'm not yet fully aware of the context and where this script is supposed to be running, but I'll look into it separately. I'll test this on Safari on macOS as you suggested and also write some unit tests.
Patch Set 2: Ignore special characters It's a minor change, but it seemed like we could have been making some errors by treating special characters as normal characters. I'll try the output on Safari to be sure. https://codereview.adblockplus.org/29426594/diff/29427594/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427594/lib/abp2blocklist.j... lib/abp2blocklist.js:411: // We don't deal with special characters for now. It should be possible to merge "\[" and "\(" into "[\[\(]" but it's probably not worth the added complexity.
Patch Set 3: Look for matches only within a certain range Now we're talking. With this small tweak we're able to reduce the size of the rule set by ~3% by increasing the execution time only negligibly. Before: * Execution time: ~15s * Output size: 38,955 rules After: * Execution time: ~15.5s * Output size: 37,759 rules
On 2017/05/02 09:24:46, kzar wrote: > Another thing to check, have you made sure the blocking list generated still > works with Safari after your changes? I have now tested this on iOS (actually the simulator in Xcode), and both the merged and non-merged rules are working fine. For example the filter "^https?://.*/ads[123456789]/", which is a combination of 9 filters, is correctly blocking a URL like "https://manishjethani.com/ads1/" but not "https://manishjethani.com/ads0/" or "https://manishjethani.com/ads1" I have tested both [] and ? including cases where they are both combined (e.g. "^https?://.*/public/ad[sv]?/")
Patch Set 4: Fix bugs and add unit tests Fixed some bugs so that there are no duplicate rules, and improved the algorithm so it generates the smallest set of rules. e.g. "ads", "bds", "adv", "bdv", "adx", and "bdx", will give you "ad[svx]" and "bd[svx]" (2 rules), not "[ab]ds", "[ab]dv", and "[ab]dx" (3 rules). Before: * Execution time: ~15s * Output size: 38,955 rules After: * Execution time: ~16s * Output size: 37,448 rules
Patch Set 5: Add advanced merge support This is beyond the scope of the issue, but I figured we might as well go all the way. With this change we are able to merge ["/ad", "/adxsi", "/xsi", "/axsi", "/bxsi"] into ["^https?://.*/ad(xsi)?", "^https?://.*/[ab]?xsi"]. This increases the execution time by ~30% but reduces the size of the rule set by another 400 rules. I've left it turned off by default.
> Also, the added execution is easily offset by using a more efficient > implementation of tldjs.getDomain. I was able to make the pre-change > script run ~85% faster by swapping in my own implementation. We > could also just use the implementation in adblockpluschrome. I'm not > yet fully aware of the context and where this script is supposed to > be running, but I'll look into it separately. So abp2blocklist is used in two ways as far as I'm aware. 1. For the generation of our blocking lists used by our iOS and other mobile apps. This happens using Node.js and therefore the slow tldjs module. IIRC this is actually done manually since false-positives are still common enough to be a problem. Speed is not important here. 2. For the content blocking feature of Adblock Plus for Safari. This happens when the user enabled the experimental option from the extension's option page. The lists are generated at runtime, so speed is important, but the slow tldjs module isn't used. This use case is less of a focus now since we've moved Safari support onto a separate branch of adblockpluschrome, but I guess it's still worth considering. Things that matter: - Safari currently has a limit of 50,000 blocking rules. This means that EasyList + AA can be enabled, but isn't enough for the addition of region specific filter lists such as EasyList Dutch. If we can reduce the number of blocking rules generated enough to allow for extra lists to be enabled that would be a big win. - Accuracy of blocking. Since Adblock Plus filters don't translate perfectly to content blocking rules we have had issues with false negatives and (much worse) false positives. If we can trust abp2blocklist enough to generate the blocking lists for our mobile apps automatically it would be great. I've also just realised I'm pretty much repeating what Felix said in a recent email to Hubert, I'll forward that on to you now.
On 2017/05/03 10:22:18, kzar wrote: > - Accuracy of blocking. Since Adblock Plus filters don't translate perfectly > to content blocking rules we have had issues with false negatives and > (much worse) false positives. If you're interested here's[1] the repository for those manually tweaked blocking lists. Have a look through the commits there to see the false positives we've had to fix. [1] - https://hg.adblockplus.org/contentblockerlists
Just checking in IRC and we do still care about supporting Adblock Plus for Safari in abp2blocklist, so that's another thing to test! Check out the `safari` branch of adblockpluschrome and do `./build.py -t safari devenv` make sure the content blocking feature still works in Safari 9.0 with your abp2blocklist changes.
I need more time to get my head around what you're doing here but I've made a superficial start with the review. Hopefully that's enough that I'm not blocking you for now. (Adding Felix to Cc since he's interested in the content blocking stuff.) https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:369: function closeMatch(s, t, {multi = false} = {}) I've not seen this syntax before `{multi = false} = {}` could you explain? https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:377: // If the string lenghts differ by more than one character, we cannot arrive Nit: Typo "lenghts". https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:417: // We don't deal with special characters for now. So we skip special characters in the url-filter regexp, but I wonder if that's enough to avoid messing up regexps. For example s = "[ab]" b = "[bc]" Wouldn't we end up with "[abc]"? I'm not sure if things like that are a problem in practice though, what do you think? https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:422: if (diff == 0) Nit: Couldn't this be an `else if` too? https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:432: else if (diff > 0) Nit: Please use braces since the clause spans multiple lines. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:476: rulesInfo[index] = {rule}; I'm not sure syntax like this will work for Safari 9 :( https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:564: ruleInfo.multiEditMatch); Nit: Long line.
On 2017/05/03 10:27:49, kzar wrote: > If you're interested here's[1] the repository for those manually tweaked > blocking lists. Have a look through the commits there to see the false > positives we've had to fix. > > [1] - https://hg.adblockplus.org/contentblockerlists Thanks, this is helpful.
Comments inline. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:369: function closeMatch(s, t, {multi = false} = {}) On 2017/05/03 11:17:24, kzar wrote: > I've not seen this syntax before `{multi = false} = {}` could you explain? This: function func(param1, param2, {option1 = false} = {}) { ... } Is the equivalent of this: function func(param1, param2, options) { let option1 = false; if (options && options.option1 !== undefined) // Note: strict equality option1 = options.option1; ... } It's part of the ES6 standard. http://2ality.com/2015/01/es6-destructuring.html#simulating-named-parameters-... The idea is that a function will take a list of parameters that are strictly required for it to do its job, and then a bunch of options. The list of options can keep growing over time. Here they can be added in a backwards compatible way without much fuss. This is a common pattern in JavaScript. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:417: // We don't deal with special characters for now. On 2017/05/03 11:17:24, kzar wrote: > So we skip special characters in the url-filter regexp, but I wonder if that's > enough to avoid messing up regexps. For example > > s = "[ab]" > b = "[bc]" > > Wouldn't we end up with "[abc]"? The above is not a good example because this is not even supported. The "edit distance" (Levenshtein distance) between the two is greater than 1. It's going to treat the above two inputs as "not close matches". However let's take this one: "[ab]" and "[ac]". It's going to mess up and give you "[a[bc]]". I thought of supporting this, but then I decided against it after looking at the actual values for the URL filters. There is not a single instance of this case. The code right now makes the assumption that this never occurs in the input. The ignoring of special characters is more for when the character appears as a literal in the original filter. For example, "foo.com/?ad" becomes "^https?://foo\.com/\?ad", and we just don't want to deal with the "?" because it is escaped (which means we really have to consider the preceding backslash with it, and that would complicate things). https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:476: rulesInfo[index] = {rule}; On 2017/05/03 11:17:24, kzar wrote: > I'm not sure syntax like this will work for Safari 9 :( I'll check, but if it doesn't work then I'll have to change this in a few places.
https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:369: function closeMatch(s, t, {multi = false} = {}) On 2017/05/03 14:41:54, Manish Jethani wrote: > On 2017/05/03 11:17:24, kzar wrote: > > I've not seen this syntax before `{multi = false} = {}` could you explain? > > This: > > function func(param1, param2, {option1 = false} = {}) > { > ... > } > > Is the equivalent of this: > > function func(param1, param2, options) > { > let option1 = false; > > if (options && options.option1 !== undefined) // Note: strict equality > option1 = options.option1; > > ... > } > > It's part of the ES6 standard. > > http://2ality.com/2015/01/es6-destructuring.html#simulating-named-parameters-... > > The idea is that a function will take a list of parameters that are strictly > required for it to do its job, and then a bunch of options. The list of options > can keep growing over time. Here they can be added in a backwards compatible way > without much fuss. This is a common pattern in JavaScript. Acknowledged. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:417: // We don't deal with special characters for now. On 2017/05/03 14:41:54, Manish Jethani wrote: > However let's take this one: "[ab]" and "[ac]". It's going to mess up and give > you "[a[bc]]". > > I thought of supporting this, but then I decided against it after looking at the > actual values for the URL filters. There is not a single instance of this case. > The code right now makes the assumption that this never occurs in the input. Maybe add a comment explaining that assumption? https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:454: let copy = { How about `return Object.create(rule, {"url-filter": {value: undefined}});` https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:470: const heuristicRange = 100; Since the code either runs in a place where speed really matters, or where it doesn't matter at all, perhaps we should change this and the multi option? I was thinking "multi" could be renamed "slow" or something like that. If slow's true then match multiple character changes and have no heuristicRange limit. We could perhaps use the option to make other decisions in the future too, if we think of other ways to process the rules. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:502: if (rulesInfo[i].stringifiedWithoutURLFilter == I wonder if we could create a lookup table stringifiedWithoutURLFilter => [ruleInfo] instead?
Patch Set 6: Improved matching algorithm Now instead of trying to do both kinds of matches within the same loop, I've separated the logic out into two different algorithms. If the strings differ by zero or one character in length, we try one; otherwise we try the other one. It works beautifully, easier to understand as well. I ran an "advanced" and "exhaustive" (test every rule against every other) version of this on the same easylist.txt, here are the results: Before: * Execution time: ~15s * Output size: 38,955 rules After: * Execution time: 1m ~25s * Output size: 36,649 rules It's smaller by more than 2,000 rules. I'm still getting ideas. I also manually examined about 15% of the generated rules, looks OK to me except one thing: the hyphen ("-") should appear first within square brackets, otherwise it might be a parse error. i.e. "[-/_]" instead of "[/-_]". I'll fix this. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:377: // If the string lenghts differ by more than one character, we cannot arrive On 2017/05/03 11:17:24, kzar wrote: > Nit: Typo "lenghts". Done. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:417: // We don't deal with special characters for now. On 2017/05/03 15:19:04, kzar wrote: > On 2017/05/03 14:41:54, Manish Jethani wrote: > > However let's take this one: "[ab]" and "[ac]". It's going to mess up and give > > you "[a[bc]]". > > > > I thought of supporting this, but then I decided against it after looking at > the > > actual values for the URL filters. There is not a single instance of this > case. > > The code right now makes the assumption that this never occurs in the input. > > Maybe add a comment explaining that assumption? > Added a comment to explain this. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:422: if (diff == 0) On 2017/05/03 11:17:24, kzar wrote: > Nit: Couldn't this be an `else if` too? Done. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:432: else if (diff > 0) On 2017/05/03 11:17:24, kzar wrote: > Nit: Please use braces since the clause spans multiple lines. Done. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:454: let copy = { On 2017/05/03 15:19:04, kzar wrote: > How about `return Object.create(rule, {"url-filter": {value: undefined}});` That would not work for multiple reasons, but most notably because "url-filter" is a property of rule.trigger, not rule itself. The other reason is that an object created that way has no own, enumerable properties, therefore the stringified version is "{}". https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:470: const heuristicRange = 100; On 2017/05/03 15:19:04, kzar wrote: > Since the code either runs in a place where speed really matters, or where it > doesn't matter at all, perhaps we should change this and the multi option? > > I was thinking "multi" could be renamed "slow" or something like that. If slow's > true then match multiple character changes and have no heuristicRange limit. We > could perhaps use the option to make other decisions in the future too, if we > think of other ways to process the rules. In the latest update the generateRules function takes both general options like fastMerge and more specific options like advancedMerge (experimental or slow stuff) and exhaustiveMerge (test every rule against every other rule). If fast is true, it'll skip both these; if set to false, it'll be as experimental and as slow as it can (but you can still turn off individual settings). https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:502: if (rulesInfo[i].stringifiedWithoutURLFilter == On 2017/05/03 15:19:04, kzar wrote: > I wonder if we could create a lookup table stringifiedWithoutURLFilter => > [ruleInfo] instead? I'm not sure what the benefit of that would be. We have two ruleInfo objects, and we want to find out if they're identical except for the URL filter. Now it's as simple as a string comparison. I actually tried the lookup table approach (we don't even need to store an array there, just an empty object), it is significantly slower. But you give me a good idea: compare hashes instead. I just take the first 32 bits of the SHA-1 hash of the string, and it makes it more than 10% faster. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:564: ruleInfo.multiEditMatch); On 2017/05/03 11:17:24, kzar wrote: > Nit: Long line. Done.
Patch Set 7: Make it work in Safari OK, so I got this working in Safari. Safari does not support named parameters, I had to replace it with regular function arguments. Also, instead of doing SHA-1 (dependency on Node.js crypto module), we can just do Pearson hashing. I found that this wasn't worth it for the browser case, but it really helps with the exhaustive merging on Node.js. I got these results in Safari on my machine: Merging only single character edits: [Log] Rule merging took 503ms [Log] Before: 49259 rules [Log] After: 47529 rules Merging both single character and multiple character edits (advanced): [Log] Rule merging took 1877ms [Log] Before: 49259 rules [Log] After: 47080 rules Merging both single character and multiple character edits, comparing each rule with every other rule (advanced and exhaustive): [Log] Rule merging took 100365ms [Log] Before: 49259 rules [Log] After: 46623 rules
Patch Set 7: Simplify and optimize The main change is that we're now processing rules in groups. Where previously we were doing (N * (N - 1)) / 2 operations for a rule set size of N, now we're grouping them in G groups and doing G * ((M * (M - 1)) / 2) operations, where M is the average group size. In practical terms this means that given 50,000 rules containing 5,000 mergeable groups of exactly size 10 each, we would be doing 225,000 operations instead of previously 1,249,975,000 operations. We do infact have nearly 50,000 rules and about 8,000 distinct mergeable groups. (Note: Only rules within the same group can be merged. Two rules are in the same group if all their properties except the URL filter are identical.) This also means there's no need for millions of string comparisons, hence no need for the hash function. With this latest change, I'm able to reduce the rule set size by ~1,700 rules in ~300ms, and by ~2,300 rules in ~1.6s. This is it for now, I don't think I can improve it any further. I did notice that there are a few rules where we could merge more than two using nested parentheses. For example, "ad", "advert", "advertisement", and "advertisements" are currently merged into two different rules. We could merge all four into a single rule like "ad(vert(isements?)?)?". This is not as difficult as it may seem, we're only one step away from this. I'll first check if there are enough of these that it's worth the effort. It might also be possible to get rid of redundant rules. For example, "^https?://.*/ad" covers both "^https?://.*/ads" and "^https?://.*/advertisement". We cannot merge all three, but only one of these is required anyway, we can simply discard the other two. I'm going to continue to investigate this and see what else we can do.
On 2017/05/05 22:10:17, Manish Jethani wrote: > Patch Set 7: Simplify and optimize Sorry, that's Patch Set 8.
On 2017/05/04 02:49:32, Manish Jethani wrote: > On 2017/05/03 15:19:04, kzar wrote: > > I wonder if we could create a lookup table stringifiedWithoutURLFilter => > > [ruleInfo] instead? > > I'm not sure what the benefit of that would be. Sorry I did not get this the first time :) I've done exactly this now, thanks.
Patch Set 10: Merge rules by resource-type and if-domain It turns out we had some low hanging fruit. A number of rules differed only by the "resource-type" or "if-domain" properties (after all the other merging). If we merged these, we got a significant reduction. I've implemented this now. Here are the results on EasyList Germany+EasyList+AA: * . Rule merging took 1,299ms * Before: 55,586 rules * . After: 49,368 rules Now we can accommodate EasyList Germany because the combined rule set is smaller than 50,000 rules.
Updated description with something of a specification. Updated title to reflect expanded scope of the issue.
Patch Set 12: Filter out redundant rules before merging If three rules that are otherwise identical have the URL filters "^https?://.*/ad", "^https?://.*/ads", and "^https?://.*/advertisement", where previously we would merge any two of these into one (depending on the order they appeared in), now we simply discard the redundant rules, leaving us with just "^https?://.*/ad". This is expensive so we do this only in exhaustive mode. With EasyList China+EasyList+AA, this reduces the rule set by another ~900 rules and even makes it run faster because of the smaller rule set to merge on. Unfortunately we're still at ~51,000 rules, above the limit.
On 2017/05/07 22:50:37, Manish Jethani wrote: > "^https?://.*/ad", "^https?://.*/ads", and "^https?://.*/advertisement", where > previously we would merge any two of these into one (depending on the order they > appeared in) Sorry, that is not true. We would always merge "ad" and "ads" into "ads?", as mentioned in the specification.
> Below are the results of running the merge function on the > various subscription options available in Adblock Plus for > Safari on macOS. Awesome. Since some combinations have gone from above 50,000 to below it's going to be worth including this in Adblock Plus for Safari. Could you open an issue in Trac that's blocked by #3673 to update the abp2blocklist dependency? Thanks for the detailed explanation of what's going on in the codereview description. Even though that's not what I meant in my email it turned out to be really useful! Your changes are looking really good so far, thanks for adding thorough unit tests as well. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:476: rulesInfo[index] = {rule}; On 2017/05/03 14:41:54, Manish Jethani wrote: > On 2017/05/03 11:17:24, kzar wrote: > > I'm not sure syntax like this will work for Safari 9 :( > > I'll check, but if it doesn't work then I'll have to change this > in a few places. You mentioned testing the code on Safari now, but unless I missed it not the version. Which version did you test with? https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:371: // This function returns an edit operation, one of "substitute", "delete", Mind using the JSDoc syntax for the comment explaining this function? https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:427: if (t.length - j == i || s[s.length - j - 1] != t[t.length - j - 1]) I find this part hard to grok, could you add a comment? https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:443: // If there are any special characters in the delta, bail. Nit: IMO this comment doesn't add much. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:515: const heuristicRange = 10; Could you mention about the behaviour (or lack thereof) of heursiticRange when exhaustive is true in this comment? https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:518: // Throw out obviously redundant rules. Nit: Please surround with braces since it spans multiple lines. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:779: let rulesInfo = rules.map(rule => ({rule})); What's the purpose of this line? https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:782: .forEach(mergeableRulesInfo => Any reason you used forEach instead of for ... of ... here (and below)? We prefer the latter unless there's a good reason. Oh, there is one problem with that though, the JS Hydra transpiler produces code that evaluates the right hand part of for ... of ... loops for each iteration. So bear that in mind if you do use those, we often assigned the value to a variable first to avoid expensive operations from being performed repeatedly! https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:795: .forEach(mergeableRulesInfo => This logic seems pretty much the same as above, I wonder if we could avoid duplicating it somehow? https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:801: rulesInfo = rulesInfo.filter(ruleInfo => !ruleInfo.merged); Maybe a dumb question, but instead of setting the merged or redundant property on ruleInfo Objects and then filtering those away couldn't we just delete them immediately instead? https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:924: let mergeOptions = { Couldn't you just pass options.exhaustiveMerge straight through instead of putting it inside this Object? Also wouldn't it make more sense to take the merging options in the constructor instead of this method? I think the defaultOptions logic above looks especially out of place here.
Comments inline. I acknowledge the rest of your comments and will make the suggested changes in my next update. I'll also file an issue for including this in Safari for macOS and update Trac issue #3673 to reflect the current work. https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:476: rulesInfo[index] = {rule}; On 2017/05/08 08:13:02, kzar wrote: > On 2017/05/03 14:41:54, Manish Jethani wrote: > > On 2017/05/03 11:17:24, kzar wrote: > > > I'm not sure syntax like this will work for Safari 9 :( > > > > I'll check, but if it doesn't work then I'll have to change this > > in a few places. > > You mentioned testing the code on Safari now, but unless I missed it not the > version. Which version did you test with? I've been testing with Safari 10. Anyway, this is not a problem because JS Hydra transpiles {rule} into {rule: rule}. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:779: let rulesInfo = rules.map(rule => ({rule})); On 2017/05/08 08:13:02, kzar wrote: > What's the purpose of this line? The purpose of this line is to create one "ruleInfo" object for each rule, where the ruleInfo object has one property, "rule", pointing to the rule it belongs to. Essentially this: let rulesInfo = new Array(rules.length); for (let i = 0; i < rules.length; i++) rulesInfo[i] = {rule: rule}; https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:782: .forEach(mergeableRulesInfo => On 2017/05/08 08:13:03, kzar wrote: > Any reason you used forEach instead of for ... of ... here (and below)? We > prefer the latter unless there's a good reason. It was for (let value of map.values()) { ... } at first, but as you said JS Hydra can't handle non-array iterables. It looks for the length property on the iterable, trying to convert this into a regular for loop with an index. i.e. for (let i = 0; i < map.values().length; i++) { let value = map.values()[i]; ... } The only other way to do this is: let mapValues = Array.from(map.values()); for (let value of mapValues) { ... } If we simply call the forEach method on the map object, that does the job as well, without creating another intermediate array. I thought this was better because these intermediate arrays would again take up more memory. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:795: .forEach(mergeableRulesInfo => On 2017/05/08 08:13:02, kzar wrote: > This logic seems pretty much the same as above, I wonder if we could avoid > duplicating it somehow? I suppose we could merge them into one, but then we'd end up with an if...else or a function lookup inside that loop. They're also sufficiently different that I thought it made sense to keep them separate. The first one is really expensive, the second one is relatively very cheap. I could still merge them if you feel like that would be better. We'll also have to check performance again. I've found making small changes can hurt performance in a big way. For example, this: obj[flag ? "prop1" : "prop2"] = value; Is way, way slower than this: if (flag) obj.prop1 = value; else obj.prop2 = value; For whatever reason. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:801: rulesInfo = rulesInfo.filter(ruleInfo => !ruleInfo.merged); On 2017/05/08 08:13:02, kzar wrote: > Maybe a dumb question, but instead of setting the merged or redundant property > on ruleInfo Objects and then filtering those away couldn't we just delete them > immediately instead? I could try that, but then we'd have to adjust the indexes i and j every time we delete something from the array. Also deleting individual elements from an array should be way more expensive than marking them first and then filtering them out later using the filter method.
Patch Set 13: Move merge options to constructor and update comments Addressed all review comments. My comments inline. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:371: // This function returns an edit operation, one of "substitute", "delete", On 2017/05/08 08:13:03, kzar wrote: > Mind using the JSDoc syntax for the comment explaining this function? Done. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:427: if (t.length - j == i || s[s.length - j - 1] != t[t.length - j - 1]) On 2017/05/08 08:13:02, kzar wrote: > I find this part hard to grok, could you add a comment? Done. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:443: // If there are any special characters in the delta, bail. On 2017/05/08 08:13:02, kzar wrote: > Nit: IMO this comment doesn't add much. I'm slightly rephrasing here, but since I've added comments all over the function now, I'll leave this here. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:515: const heuristicRange = 10; On 2017/05/08 08:13:03, kzar wrote: > Could you mention about the behaviour (or lack thereof) of heursiticRange when > exhaustive is true in this comment? Done. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:518: // Throw out obviously redundant rules. On 2017/05/08 08:13:02, kzar wrote: > Nit: Please surround with braces since it spans multiple lines. Done. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:924: let mergeOptions = { On 2017/05/08 08:13:02, kzar wrote: > Couldn't you just pass options.exhaustiveMerge straight through instead of > putting it inside this Object? > > Also wouldn't it make more sense to take the merging options in the constructor > instead of this method? I think the defaultOptions logic above looks especially > out of place here. Done.
On 2017/05/08 08:13:04, kzar wrote: > Thanks for the detailed explanation of what's going on in the codereview > description. Even though that's not what I meant in my email it turned out to be > really useful! I've updated #3673 now, but I've made minimal changes just to cover the overall picture so far. Let me know if I should be more detailed. https://issues.adblockplus.org/ticket/3673?action=diff&version=9
https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:476: rulesInfo[index] = {rule}; On 2017/05/08 14:03:58, Manish Jethani wrote: > On 2017/05/08 08:13:02, kzar wrote: > > On 2017/05/03 14:41:54, Manish Jethani wrote: > > > On 2017/05/03 11:17:24, kzar wrote: > > > > I'm not sure syntax like this will work for Safari 9 :( > > > > > > I'll check, but if it doesn't work then I'll have to change this > > > in a few places. > > > > You mentioned testing the code on Safari now, but unless I missed it not the > > version. Which version did you test with? > > I've been testing with Safari 10. > > Anyway, this is not a problem because JS Hydra transpiles {rule} into {rule: > rule}. I think you should test with Safari 9 at least once to make sure it still works. If you don't have a access to a copy you can us our test machines from https://app.webmate.io/ https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:443: // If there are any special characters in the delta, bail. On 2017/05/08 23:12:48, Manish Jethani wrote: > On 2017/05/08 08:13:02, kzar wrote: > > Nit: IMO this comment doesn't add much. > > I'm slightly rephrasing here, but since I've added comments all over the > function now, I'll leave this here. Fair enough. Thanks for those extra comments, they helped me grok the algorithm. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:779: let rulesInfo = rules.map(rule => ({rule})); On 2017/05/08 14:03:59, Manish Jethani wrote: > On 2017/05/08 08:13:02, kzar wrote: > > What's the purpose of this line? > > The purpose of this line is to create one "ruleInfo" object for each rule, where > the ruleInfo object has one property, "rule", pointing to the rule it belongs > to. > > Essentially this: > > let rulesInfo = new Array(rules.length); > for (let i = 0; i < rules.length; i++) > rulesInfo[i] = {rule: rule}; Argh, sorry that was an especially dumb question. In my defence a cat jumped on my head at 4am that morning, I couldn't get back to sleep and was kind of dazed all day! https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:782: .forEach(mergeableRulesInfo => On 2017/05/08 14:04:00, Manish Jethani wrote: > On 2017/05/08 08:13:03, kzar wrote: > > Any reason you used forEach instead of for ... of ... here (and below)? We > > prefer the latter unless there's a good reason. > > It was for (let value of map.values()) { ... } at first, but as you said JS > Hydra can't handle non-array iterables. It looks for the length property on the > iterable, trying to convert this into a regular for loop with an index. > > i.e. for (let i = 0; i < map.values().length; i++) { let value = > map.values()[i]; ... } > > The only other way to do this is: > > let mapValues = Array.from(map.values()); > for (let value of mapValues) { ... } > > If we simply call the forEach method on the map object, that does the job as > well, without creating another intermediate array. I thought this was better > because these intermediate arrays would again take up more memory. Well groupRulesByMergeableProperty creates the array either way, so assuming you didn't keep a reference to that after the second array is created I don't see that it would use more memory. That said I don't care too much and IMO this is a pretty reasonable use of forEach. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:795: .forEach(mergeableRulesInfo => On 2017/05/08 14:03:59, Manish Jethani wrote: > On 2017/05/08 08:13:02, kzar wrote: > > This logic seems pretty much the same as above, I wonder if we could avoid > > duplicating it somehow? > > I suppose we could merge them into one, but then we'd end up with an if...else > or a function lookup inside that loop. They're also sufficiently different that > I thought it made sense to keep them separate. The first one is really > expensive, the second one is relatively very cheap. > > I could still merge them if you feel like that would be better. We'll also have > to check performance again. I've found making small changes can hurt performance > in a big way. > > For example, this: > > obj[flag ? "prop1" : "prop2"] = value; > > Is way, way slower than this: > > if (flag) > obj.prop1 = value; > else > obj.prop2 = value; > > For whatever reason. Acknowledged. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:801: rulesInfo = rulesInfo.filter(ruleInfo => !ruleInfo.merged); On 2017/05/08 14:03:58, Manish Jethani wrote: > On 2017/05/08 08:13:02, kzar wrote: > > Maybe a dumb question, but instead of setting the merged or redundant property > > on ruleInfo Objects and then filtering those away couldn't we just delete them > > immediately instead? > > I could try that, but then we'd have to adjust the indexes i and j every time we > delete something from the array. Also deleting individual elements from an array > should be way more expensive than marking them first and then filtering them out > later using the filter method. Acknowledged. https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.j... lib/abp2blocklist.js:799: const defaultOptions = {exhaustive: false}; Have defaultOptions be a property on ContentBlockerList.prototype so you can use it both here and in the constructor? https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.j... lib/abp2blocklist.js:947: rules = mergeRules(rules, {exhaustive: this.options.exhaustiveMerge}); Why wrap the exhaustiveMerge option in an Object here?
On 2017/05/08 23:29:56, Manish Jethani wrote: > On 2017/05/08 08:13:04, kzar wrote: > > > Thanks for the detailed explanation of what's going on in the codereview > > description. Even though that's not what I meant in my email it turned out to > be > > really useful! > > I've updated #3673 now, but I've made minimal changes just to cover the overall > picture so far. Let me know if I should be more detailed. > > https://issues.adblockplus.org/ticket/3673?action=diff&version=9 Great thanks, that looks good now.
Patch Set 14: Use for..of instead of forEach where possible https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29427618/lib/abp2blocklist.j... lib/abp2blocklist.js:476: rulesInfo[index] = {rule}; On 2017/05/09 10:05:46, kzar wrote: > On 2017/05/08 14:03:58, Manish Jethani wrote: > > On 2017/05/08 08:13:02, kzar wrote: > > > On 2017/05/03 14:41:54, Manish Jethani wrote: > > > > On 2017/05/03 11:17:24, kzar wrote: > > > > > I'm not sure syntax like this will work for Safari 9 :( > > > > > > > > I'll check, but if it doesn't work then I'll have to change this > > > > in a few places. > > > > > > You mentioned testing the code on Safari now, but unless I missed it not the > > > version. Which version did you test with? > > > > I've been testing with Safari 10. > > > > Anyway, this is not a problem because JS Hydra transpiles {rule} into {rule: > > rule}. > > I think you should test with Safari 9 at least once to make sure it still works. > If you don't have a access to a copy you can us our test machines from > https://app.webmate.io/ "{rule: rule}" ought to work in every single JS engine in existence. It makes sense to test everything out in Safari 9 though, I'll do that and get back with the results. https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:779: let rulesInfo = rules.map(rule => ({rule})); On 2017/05/09 10:05:46, kzar wrote: > Argh, sorry that was an especially dumb question. In my defence a cat jumped on > my head at 4am that morning, I couldn't get back to sleep and was kind of dazed > all day! Haha, no worries :) https://codereview.adblockplus.org/29426594/diff/29432555/lib/abp2blocklist.j... lib/abp2blocklist.js:782: .forEach(mergeableRulesInfo => On 2017/05/09 10:05:46, kzar wrote: > On 2017/05/08 14:04:00, Manish Jethani wrote: > > On 2017/05/08 08:13:03, kzar wrote: > > > Any reason you used forEach instead of for ... of ... here (and below)? We > > > prefer the latter unless there's a good reason. > > > > It was for (let value of map.values()) { ... } at first, but as you said JS > > Hydra can't handle non-array iterables. It looks for the length property on > the > > iterable, trying to convert this into a regular for loop with an index. > > > > i.e. for (let i = 0; i < map.values().length; i++) { let value = > > map.values()[i]; ... } > > > > The only other way to do this is: > > > > let mapValues = Array.from(map.values()); > > for (let value of mapValues) { ... } > > > > If we simply call the forEach method on the map object, that does the job as > > well, without creating another intermediate array. I thought this was better > > because these intermediate arrays would again take up more memory. > > Well groupRulesByMergeableProperty creates the array either way, so assuming you > didn't keep a reference to that after the second array is created I don't see > that it would use more memory. That said I don't care too much and IMO this is a > pretty reasonable use of forEach. The intermediate array here would an array of the arrays created inside groupRulesByMergeableProperty. Since you brought this up, I noticed that we're using a forEach in that function itself where we could use for..of. I've changed it now both in groupRulesByMergeableProperty and mergeRulesByArrayProperty. https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.j... lib/abp2blocklist.js:799: const defaultOptions = {exhaustive: false}; On 2017/05/09 10:05:47, kzar wrote: > Have defaultOptions be a property on ContentBlockerList.prototype so you can use > it both here and in the constructor? Actually the options for the ContentBlockerList constructor are different than the options for the mergeRules function, which is not part of ContentBlockerList at all. For what it's worth I think it might even be a good idea to move the merging code out into a separate module. I don't know if this would be better or worse for performance. https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.j... lib/abp2blocklist.js:947: rules = mergeRules(rules, {exhaustive: this.options.exhaustiveMerge}); On 2017/05/09 10:05:47, kzar wrote: > Why wrap the exhaustiveMerge option in an Object here? This is because mergeRules takes an option called "exhaustive", whereas the ContentBlockerList constructor takes an option called "exhaustiveMerge". The name of the latter is more specific because "exhaustive" could mean just anything to the constructor (exhaustive merging, exhaustive rule generation, exhaustive filter addition, possibly other things the ContentBlockerList class starts to do over time as we add features). On the other hand, the mergeRules function, which could very well be its own module, does one and only one thing: it merges rules. There is no possible ambiguity about the meaning of the word "exhaustive" there. Now all of this would really matter if both ContentBlockerList and mergeRules were public APIs. The latter is not even exposed outside of this file, it really doesn't matter one way or another.
https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.j... lib/abp2blocklist.js:947: rules = mergeRules(rules, {exhaustive: this.options.exhaustiveMerge}); On 2017/05/09 15:52:47, Manish Jethani wrote: > This is because mergeRules takes an option called > "exhaustive", whereas the > ContentBlockerList constructor takes an option called > "exhaustiveMerge". Well if you passed through the value of exhaustiveMerge to mergeRules without wrapping it in an Object you could call the parameter exhaustive there. You could also get rid of the defaultOptions logic from mergeRules then as well. > ...if both ContentBlockerList and mergeRules were public APIs. The > latter is not even exposed outside of this file... Right, so what benefit is there with this indirection? I could see the purpose of taking an options Object if mergeRules was a public API, that way you could add new options without breaking backwards compatibility, but it isn't. https://codereview.adblockplus.org/29426594/diff/29434578/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29434578/lib/abp2blocklist.j... lib/abp2blocklist.js:751: let oneRuleInfo = rulesInfo.shift(); Nit: firstRuleInfo?
I wonder if there should be the merge option at all? We don't want to do exhaustive merging in Adblock Plus for Safari but I can't think of any place where we don't want merging at all. Am I missing something?
Patch Set 15: Change options object into a single boolean option https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29433655/lib/abp2blocklist.j... lib/abp2blocklist.js:947: rules = mergeRules(rules, {exhaustive: this.options.exhaustiveMerge}); On 2017/05/09 16:50:58, kzar wrote: > On 2017/05/09 15:52:47, Manish Jethani wrote: > > This is because mergeRules takes an option called > > "exhaustive", whereas the > > ContentBlockerList constructor takes an option called > > "exhaustiveMerge". > > Well if you passed through the value of exhaustiveMerge to mergeRules without > wrapping it in an Object you could call the parameter exhaustive there. You > could also get rid of the defaultOptions logic from mergeRules then as well. > > > ...if both ContentBlockerList and mergeRules were public APIs. The > > latter is not even exposed outside of this file... > > Right, so what benefit is there with this indirection? I could see the purpose > of taking an options Object if mergeRules was a public API, that way you could > add new options without breaking backwards compatibility, but it isn't. Done. https://codereview.adblockplus.org/29426594/diff/29434578/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29434578/lib/abp2blocklist.j... lib/abp2blocklist.js:751: let oneRuleInfo = rulesInfo.shift(); On 2017/05/09 16:50:59, kzar wrote: > Nit: firstRuleInfo? That's what I called it at first, but then I wanted to emphasize that it can be any one rule in the set since they're otherwise identical, it doesn't _have_ to be the first. Also "One Rule to rule them all" :)
On 2017/05/09 16:54:46, kzar wrote: > I wonder if there should be the merge option at all? We don't want to do > exhaustive merging in Adblock Plus for Safari but I can't think of any place > where we don't want merging at all. Am I missing something? Yeah, I'm not sure, but I thought the Safari extension might want to skip the merging entirely if the rule set is already within the 50,000 rules limit. EasyList+AA is, for example, and it's the default. This really helps with the user experience if we don't do unnecessary merging of rules. You don't want the user to see the beach ball icon. One good reason to keep it as an option for now is that it would break the old unit tests.
LGTM assuming it still works for Safari 9. By the way are you definitely hitting Publish each time you upload a new patch set / add inline comments? A few times I didn't receive email notifications but noticed your changes when I was browsing the website.
On 2017/05/10 08:40:40, kzar wrote: > LGTM assuming it still works for Safari 9. Now I've tested this on Safari 9, it works. > By the way are you definitely hitting Publish each time you upload a new patch > set / add inline comments? A few times I didn't receive email notifications but > noticed your changes when I was browsing the website. You're right, I wasn't submitting some of the intermediate patches I uploaded for review. I meant to, but then I would end up making another change on top of it, then I would submit the new patch instead. I'll file the issue for integrating this into the Safari extension shortly. In the meanwhile I'm interested in Sebastian's take on this.
On 2017/05/10 11:49:25, Manish Jethani wrote: > I'll file the issue for integrating this into the Safari extension shortly. Filed #5236. https://issues.adblockplus.org/ticket/5236
I just realized that this might be broken because the order of the "ignore-previous-rules" type of rules matters. If 29439639 goes through, it's going to matter even more. We'll have to merge rules at each stage of rule generation instead of merging them all at the end, because the rules are actually in groups and only rules within the same group (where order doesn't matter) can be merged.
On 2017/05/18 23:33:46, Manish Jethani wrote: > I just realized that this might be broken because the order of the > "ignore-previous-rules" type of rules matters. If 29439639 goes through, it's > going to matter even more. We'll have to merge rules at each stage of rule > generation instead of merging them all at the end, because the rules are > actually in groups and only rules within the same group (where order doesn't > matter) can be merged. Oh bugger, good point :/
Patch Set 16: Merge rules in groups Generate rules in groups and then merge rules only within the same group.
On 2017/05/20 21:09:51, Manish Jethani wrote: > Patch Set 16: Merge rules in groups > > Generate rules in groups and then merge rules only within the same group. I've updated the description here to include the latest change. I'm not updating the numbers in the "RESULTS" section because there isn't a huge change here. EasyList Germany+EasyList+AA might just get pushed over the 50,000-rule limit, but we'll see.
https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js#ne... abp2blocklist.js:25: var blockerList = new ContentBlockerList({merge: true, exhaustiveMerge: true}); Any particular reason why we even make these configurable, and not always merge the rules as good as we can?
https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js#ne... abp2blocklist.js:25: var blockerList = new ContentBlockerList({merge: true, exhaustiveMerge: true}); On 2017/05/21 21:07:29, Sebastian Noack wrote: > Any particular reason why we even make these configurable, and not always merge > the rules as good as we can? Well the exhaustiveMerge option makes sense, since turning that on slows things down quite a bit and we probably don't want to do that for Safari users where the rules are being generated on their end. I agree about the merge option however, I can't see why wouldn't ever want it. I guess maybe it's useful for the tests?
https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js#ne... abp2blocklist.js:25: var blockerList = new ContentBlockerList({merge: true, exhaustiveMerge: true}); On 2017/05/22 08:47:01, kzar wrote: > On 2017/05/21 21:07:29, Sebastian Noack wrote: > > Any particular reason why we even make these configurable, and not always > merge > > the rules as good as we can? > > Well the exhaustiveMerge option makes sense, since turning that on slows things > down quite a bit and we probably don't want to do that for Safari users where > the rules are being generated on their end. I agree about the merge option > however, I can't see why wouldn't ever want it. I guess maybe it's useful for > the tests? Yes, the exhaustive merge gives us 5-10% extra gains, but at a significant cost, it's not feasible to run that in the browser as it freezes the browser (the user sees the beachball icon and can't do anything for a while). The merge being an option, that's useful for the unit tests for now (we can fix this), but also the Safari client might want to pass false here if the rule set is already going to be within the 50,000-rule limit. This is true for the default option (EasyList+AA) already. Skipping the merge just makes it a snappy user experience. I still think we can do more in terms of user experience by doing the merge in stages, using a generator for example. function* mergeRules(rules) { // Merge some rules yield rules; // Merge some more rules yield rules; ... } ... let iterator = mergeRules(rules); setTimeout(function iterate() { let next = mergeRules.next(); rules = next.value; if (!next.done) setTimeout(iterate, interval); else addRulesToSafari(rules); }, interval); This will do it in stages so it won't freeze the browser.
Also, as discussed on IRC, https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js#ne... abp2blocklist.js:25: var blockerList = new ContentBlockerList({merge: true, exhaustiveMerge: true}); On 2017/05/22 08:47:01, kzar wrote: > On 2017/05/21 21:07:29, Sebastian Noack wrote: > > Any particular reason why we even make these configurable, and not always > merge > > the rules as good as we can? > > Well the exhaustiveMerge option makes sense, since turning that on slows things > down quite a bit and we probably don't want to do that for Safari users where > the rules are being generated on their end. I agree about the merge option > however, I can't see why wouldn't ever want it. On the other hand, the Safari extension is currently mostly effected by the 50k rules limit, hence would mostly benefit from these improvements, since we don't even offer other filter lists in the iOS app yet. As for the performance, it would be helpful if somebody could share some benchmarks comparing the performance with an without (exhaustive) merging. > I guess maybe it's useful for the tests? Well, if we'd always merge the rules, I suppose, there is no reason to consider not merging rules in the tests.
https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js#ne... abp2blocklist.js:25: var blockerList = new ContentBlockerList({merge: true, exhaustiveMerge: true}); On 2017/05/22 09:28:28, Manish Jethani wrote: > On 2017/05/22 08:47:01, kzar wrote: > > On 2017/05/21 21:07:29, Sebastian Noack wrote: > > > Any particular reason why we even make these configurable, and not always > > merge > > > the rules as good as we can? > > > > Well the exhaustiveMerge option makes sense, since turning that on slows > things > > down quite a bit and we probably don't want to do that for Safari users where > > the rules are being generated on their end. I agree about the merge option > > however, I can't see why wouldn't ever want it. I guess maybe it's useful for > > the tests? > > Yes, the exhaustive merge gives us 5-10% extra gains, but at a significant cost, > it's not feasible to run that in the browser as it freezes the browser (the user > sees the beachball icon and can't do anything for a while). > > The merge being an option, that's useful for the unit tests for now (we can fix > this), but also the Safari client might want to pass false here if the rule set > is already going to be within the 50,000-rule limit. This is true for the > default option (EasyList+AA) already. Skipping the merge just makes it a snappy > user experience. > > I still think we can do more in terms of user experience by doing the merge in > stages, using a generator for example. > > function* mergeRules(rules) > { > // Merge some rules > yield rules; > // Merge some more rules > yield rules; > ... > } > > ... > > let iterator = mergeRules(rules); > setTimeout(function iterate() { > let next = mergeRules.next(); > rules = next.value; > if (!next.done) > setTimeout(iterate, interval); > else > addRulesToSafari(rules); > }, interval); > > This will do it in stages so it won't freeze the browser. Oh, if Safari entirely freezes, that should be avoided of course. I already thought about doing the conversion into a WebWorker, but the problem with that approach would be to get the bulk of filter data into the worker. Using stages seems like a better idea. On the other hand, we are plan to replace the Safari extension with a different product which downloads the pre-generated lists from the server anyway. So any improvements that are limited to the current Safari extension might not be worth it. But I guess I leave it up to you.
https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js#ne... abp2blocklist.js:25: var blockerList = new ContentBlockerList({merge: true, exhaustiveMerge: true}); On 2017/05/22 09:39:25, Sebastian Noack wrote: > On 2017/05/22 08:47:01, kzar wrote: > > On 2017/05/21 21:07:29, Sebastian Noack wrote: > > > Any particular reason why we even make these configurable, and not always > > merge > > > the rules as good as we can? > > > > Well the exhaustiveMerge option makes sense, since turning that on slows > things > > down quite a bit and we probably don't want to do that for Safari users where > > the rules are being generated on their end. I agree about the merge option > > however, I can't see why wouldn't ever want it. > > On the other hand, the Safari extension is currently mostly effected by the 50k > rules limit, hence would mostly benefit from these improvements, since we don't > even offer other filter lists in the iOS app yet. > > As for the performance, it would be helpful if somebody could share some > benchmarks comparing the performance with an without (exhaustive) merging. EasyList+AA Non-exhaustive: * Rule merging took 1530ms * Before: 49,198 rules * After: 43,668 rules Exhaustive: * Rule merging took 29667ms * Before: 49,198 rules * After: 42,794 rules So the results improve by ~16% here if we do an exhaustive merge, but it's totally not feasible in the browser. I mean ~1.5s vs. ~30s.
https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js File abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29443583/abp2blocklist.js#ne... abp2blocklist.js:25: var blockerList = new ContentBlockerList({merge: true, exhaustiveMerge: true}); On 2017/05/22 17:39:52, Manish Jethani wrote: > On 2017/05/22 09:39:25, Sebastian Noack wrote: > > On 2017/05/22 08:47:01, kzar wrote: > > > On 2017/05/21 21:07:29, Sebastian Noack wrote: > > > > Any particular reason why we even make these configurable, and not always > > > merge > > > > the rules as good as we can? > > > > > > Well the exhaustiveMerge option makes sense, since turning that on slows > > things > > > down quite a bit and we probably don't want to do that for Safari users > where > > > the rules are being generated on their end. I agree about the merge option > > > however, I can't see why wouldn't ever want it. > > > > On the other hand, the Safari extension is currently mostly effected by the > 50k > > rules limit, hence would mostly benefit from these improvements, since we > don't > > even offer other filter lists in the iOS app yet. > > > > As for the performance, it would be helpful if somebody could share some > > benchmarks comparing the performance with an without (exhaustive) merging. > > EasyList+AA > > Non-exhaustive: > > * Rule merging took 1530ms > * Before: 49,198 rules > * After: 43,668 rules > > Exhaustive: > > * Rule merging took 29667ms > * Before: 49,198 rules > * After: 42,794 rules > > So the results improve by ~16% here if we do an exhaustive merge, but it's > totally not feasible in the browser. I mean ~1.5s vs. ~30s. Yeah, I think I agree. Even if we can avoid the freezing issue through using timeouts, a delay of 30s (potentially more on slower hardware) on browser start, extension installation, and filter change before the filters take effect would be rather bad. So let's keep this behavior behind an option which we only use on the server, not sure if we need two different options though.
Patch Set 17: Make generateRules asynchronous I've made the generateRules function and most of the merging process asynchronous now. blockList.generateRules().then(rules => { ... }); The benefit of doing this is that we're now able to do an exhaustive merge in the browser. I ran this in my Safari with EasyList+AA, it took 37 seconds to generate the rule set (with exhaustive merging), and I did not notice much as I continued to do my web browsing. Now there might be an argument against this that the user cannot wait 37 seconds after selecting a filter subscription. Well, they already have to wait for it to download, they might as well wait a little longer. This happens in the background anyway. We can show the appropriate messages in the UI (e.g. "The subscription will be activated once it has been downloaded and processed", etc.). Also, if we don't want this to take 37 seconds in the background, we can still not do an exhaustive merge but simply tweak the heuristicRange constant (e.g. set it to 100 instead of 10). This will still give us better results, adding only a few seconds to the process. For the default subscription, we can simply do this: let rules = blockList.generateRules(); if (rules.length >= 50000) blockList.mergeRules(rules); The user doesn't have to wait to see ad blocking in action because the default subscription is already within the 50,000-rule limit. For the rest of the subscriptions, they have to wait anyway, they might as well wait a little longer. The other argument against this approach might be that this makes the code harder to maintain. I'm going to say that it's a matter of opinion, but promise chaining of this kind is pretty standard in JavaScript. JavaScript is mostly asynchronous in the first place because we don't want the browser to freeze up. Promises only make this easier to manage. Let me know what you think. I haven't "asynchronousized" the entire merge process. Right now it's about 140,000 asynchronous steps, some of them with pauses using setTimeout, but it's not difficult to convert the rest of the code.
The main problem with any delay here is that after starting up Safari (while potentially restoring tabs from the previous session or loading the default homepage), nothing will be blocked until the Content Blocker is all setup. While some delay is inevitable, 30+ seconds is long enough to warrant that the user will likely notice ads not being blocked. Then again, it shouldn't matter too much as this feature, in current Adblock Plus for Safari, is labeled experimental, and in the future we won't generate the rules on the client-side anymore, anyway.
Patch Set 18: Make more code asynchronous Now I've managed to narrow the gap between exhaustive mode and non-exhaustive mode from 15% to 0.03%. It takes about 10s in my Safari to merge EasyList+AA with a non-exhaustive merge, while getting almost the same results as an exhaustive merge.
Patch Set 19: Implement merge options "all", "auto", and "none" The merge option can now take three values: * "all": merge everything * "auto" (default): be intelligent and merge only as much as required * "none": turn off merging Now EasyList+AA merges in 2s in the browser, going from a little over 50,000 rules to about 49,500 rules. EasyList Germany+AA on the other hand still takes 10s, because it's about 55,000 rules. The main idea here is that we stop merging when we've reached our goal of being within the 50,000-rule limit. The unit tests are broken now.
Patch Set 20: Remove redundant async
https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.j... lib/abp2blocklist.js:465: let i = 0, j = 0; Nit: Please split into separate lets. Also I wonder if we should leave them as undefined here and assign them in the initialisation bit of the for loops? (Especially with j it seems kind of misleading to set it to 0 here.) https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.j... lib/abp2blocklist.js:477: for (; j < t.length; j++) Since j is working backwards through the characters wouldn't it be clearer to start it as t.length -1 and decrement it down to 0? I think then you wouldn't need to do `length - j` so often later on as well? https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.j... lib/abp2blocklist.js:487: if (t.length - j - i != 1) Probably a dumb question but wouldn't this also consider identical strings to not be a close match?
Patch Set 21: Split variable declaration statement https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.j... lib/abp2blocklist.js:465: let i = 0, j = 0; On 2017/05/29 09:49:54, kzar wrote: > Nit: Please split into separate lets. Done. > Also I wonder if we should leave them as undefined here and assign them in the > initialisation bit of the for loops? (Especially with j it seems kind of > misleading to set it to 0 here.) I feel like since the variables are in the outer scope it's actually clearer to initialize them here. Also j does need to start at 0 (see next comment). https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.j... lib/abp2blocklist.js:477: for (; j < t.length; j++) On 2017/05/29 09:49:53, kzar wrote: > Since j is working backwards through the characters wouldn't it be clearer to > start it as t.length -1 and decrement it down to 0? I think then you wouldn't > need to do `length - j` so often later on as well? Then we would need two variables, one for s.length - j and one for t.length - j. j is basically the offset from the end of both the strings. https://codereview.adblockplus.org/29426594/diff/29446577/lib/abp2blocklist.j... lib/abp2blocklist.js:487: if (t.length - j - i != 1) On 2017/05/29 09:49:53, kzar wrote: > Probably a dumb question but wouldn't this also consider identical strings to > not be a close match? Yes, identical strings are not a close match by definition, since there's no edit operation. It's debatable whether this should be the case, but anyway it doesn't matter since identical URL filters will be filtered out in the previous step when we eliminate redundant rules.
https://codereview.adblockplus.org/29426594/diff/29452555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29452555/lib/abp2blocklist.j... lib/abp2blocklist.js:1033: console.error("Rule merging took " + (Date.now() - t) + "ms"); I guess these changes were included by mistake?
https://codereview.adblockplus.org/29426594/diff/29452555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29452555/lib/abp2blocklist.j... lib/abp2blocklist.js:1033: console.error("Rule merging took " + (Date.now() - t) + "ms"); On 2017/07/07 13:23:28, kzar wrote: > I guess these changes were included by mistake? Uh, jeez, this was a mistake. By the way, let's put this one on hold until all the other changes have landed, because I think those changes are going to affect how we do rule merging. I'll come back to this later and upload a new patch set.
https://codereview.adblockplus.org/29426594/diff/29452555/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29452555/lib/abp2blocklist.j... lib/abp2blocklist.js:1033: console.error("Rule merging took " + (Date.now() - t) + "ms"); On 2017/07/08 12:13:19, Manish Jethani wrote: > On 2017/07/07 13:23:28, kzar wrote: > > I guess these changes were included by mistake? > > Uh, jeez, this was a mistake. > > By the way, let's put this one on hold until all the other changes have landed, > because I think those changes are going to affect how we do rule merging. I'll > come back to this later and upload a new patch set. Sure thing, just let me know when you're ready to proceed with this one.
Patch Set 22: Rebase with minor changes Rebased on master, minor improvements, updated unit tests to work with current API and behavior.
https://codereview.adblockplus.org/29426594/diff/29493609/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29493609/lib/abp2blocklist.j... lib/abp2blocklist.js:1101: return async(Array.from(map.values()).map(mergeableRulesInfo => () => If async always took a sequence as the first argument, then we wouldn't need to convert map.values() into an Array from an iterator here which I think might reduce memory usage. We could also remove the related logic from the start of async. Perhaps we should do that and then simply wrap the first argument in an Array literal when necessary? https://codereview.adblockplus.org/29426594/diff/29493609/lib/abp2blocklist.j... lib/abp2blocklist.js:1277: convertFilterAddRules(blockingExceptionRules, filter, Nit: Please use braces for this for loop since it spans multiple lines now.
Patch Set 23: Map async callees asynchronously Comments inline. I'm also wondering if we should split out the code into async.js (for async and callLater) and merging.js (for all the merging code), since these are fairly independent of the rest of the program. https://codereview.adblockplus.org/29426594/diff/29493609/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29426594/diff/29493609/lib/abp2blocklist.j... lib/abp2blocklist.js:1101: return async(Array.from(map.values()).map(mergeableRulesInfo => () => On 2017/07/25 12:18:53, kzar wrote: > If async always took a sequence as the first argument, then we wouldn't need to > convert map.values() into an Array from an iterator here which I think might > reduce memory usage. We could also remove the related logic from the start of > async. Perhaps we should do that and then simply wrap the first argument in an > Array literal when necessary? That's a good suggestion. If the async function takes any iterable, it would also have to map the values lazily. I've modified it now so that it takes a second argument, a map function, which it calls on the callee value to get the actual callee. The mapping happens just before the asynchronous function call. I've left the logic to convert a non-iterable first argument into an iterable (array) because it makes the calling code neater. It's not expensive because the async function itself is not called too many times, it's just called with large arrays/sets of functions. https://codereview.adblockplus.org/29426594/diff/29493609/lib/abp2blocklist.j... lib/abp2blocklist.js:1277: convertFilterAddRules(blockingExceptionRules, filter, On 2017/07/25 12:18:53, kzar wrote: > Nit: Please use braces for this for loop since it spans multiple lines now. Done.
Nice, LGTM. Go ahead and push when you're ready.
Patch Set 24: Rebase This is rebased on master. There were two conflicts but nothing major.
LGTM |