|
|
Created:
Oct. 12, 2018, 4:56 a.m. by Jon Sonesen Modified:
Oct. 25, 2018, 9:48 p.m. Reviewers:
Manish Jethani Visibility:
Public. |
DescriptionIssue 6994 - Use shortcut matching for location only filters
Patch Set 1 #
Total comments: 4
Patch Set 2 : Address PS1 Comments #
Total comments: 2
Patch Set 3 : #Patch Set 4 : Reduce ternary use #
Total comments: 3
Patch Set 5 : Address PS4, and rebase #
Total comments: 10
MessagesTotal messages: 18
https://codereview.adblockplus.org/29907586/diff/29907587/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29907586/diff/29907587/lib/matcher.js#newc... lib/matcher.js:175: for (let filter of set) The idea was to have two loops, one for the location-only filters and one for the regular filters. For this, we would have to maintain the location-only filters in a separate set, per keyword. I would start by adding the following method to the `RegExpFilter` class: isLocationOnly() { return this.contentType == RegExpFilter.prototype.contentType && this.thirdParty == null && !this.domains && !this.sitekeys; } https://codereview.adblockplus.org/29907586/diff/29907587/lib/matcher.js#newc... lib/matcher.js:185: { This seems wrong.
https://codereview.adblockplus.org/29907586/diff/29907587/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29907586/diff/29907587/lib/matcher.js#newc... lib/matcher.js:175: for (let filter of set) On 2018/10/13 13:28:45, Manish Jethani wrote: > The idea was to have two loops, one for the location-only filters and one for > the regular filters. For this, we would have to maintain the location-only > filters in a separate set, per keyword. > > I would start by adding the following method to the `RegExpFilter` class: > > isLocationOnly() > { > return this.contentType == RegExpFilter.prototype.contentType && > this.thirdParty == null && !this.domains && !this.sitekeys; > } Ack https://codereview.adblockplus.org/29907586/diff/29907587/lib/matcher.js#newc... lib/matcher.js:185: { On 2018/10/13 13:28:45, Manish Jethani wrote: > This seems wrong. Yeah, super wrong. My bad
https://codereview.adblockplus.org/29907586/diff/29917555/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29907586/diff/29917555/lib/filterClasses.j... lib/filterClasses.js:803: return RegExp(location).test(this.regexp); My bad, thought I added doc strings here. https://codereview.adblockplus.org/29907586/diff/29917555/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29907586/diff/29917555/lib/matcher.js#newc... lib/matcher.js:197: continue; I think this should be in both iterations?
Actually disregard PS2 please pretty much off on that one
This is not quite there, the findKeyword test is failing
This is not quite there, the findKeyword test is failing
https://codereview.adblockplus.org/29907586/diff/29917628/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29907586/diff/29917628/lib/filterClasses.j... lib/filterClasses.js:778: get isLocationOnly() Let's make this a function rather than a property. i.e. no get
https://codereview.adblockplus.org/29907586/diff/29917628/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29907586/diff/29917628/lib/matcher.js#newc... lib/matcher.js:154: this.filterByKeyword; In the tests this fails on the last test of extractKeyword and compareKeyword
Let's come back to this once we've landed the other two patches in this area.
Ack
Ack https://codereview.adblockplus.org/29907586/diff/29917628/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29907586/diff/29917628/lib/filterClasses.j... lib/filterClasses.js:778: get isLocationOnly() On 2018/10/21 16:06:52, Manish Jethani wrote: > Let's make this a function rather than a property. i.e. no get Acknowledged.
This may need a rebase now.
Yeah it does, I am doing that now and cleaning the patch up some more :)
https://codereview.adblockplus.org/29907586/diff/29923588/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29907586/diff/29923588/lib/filterClasses.j... lib/filterClasses.js:816: return this.contentType == RegExpFilter.prototype.contentType && I haven't run this code, but I'm pretty sure that this would cause the initial memory footprint to go up, because we would end up creating the domain maps for all the blocking/whitelist filters. We could change the check to this: return this.contentType == RegExpFilter.prototype.contentType && this.thirdParty == null && !this.domainSource && !this.sitekeySource && !this.domains && !this.sitekeys; For filters that do have domains, it would short-circuit at `!this.domainSource` and thus avoid the creation of the domain map. If the domain map has already been created, it would fail at `!this.domains`. For filters that do not have domains, both tests would pass. https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:76: * Lookup table for location only filters by their associated keyword Nit: I don't know if this is right, but I would have spelt it "location-only" (with a hyphen). https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:78: * @private Let's put @private last. https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:81: this._fastFilterByKeyword = new Map(); I'm thinking about the nomenclature here, about how to distinguish the two filter types. What do you think about "simple" filters and "complex" filters? Then we can rename `_filterByKeyword` to `_complexFiltersByKeyword` and this one to `_simpleFiltersByKeyword` (note: I also made filters plural here, which is how it should have been from the beginning).
https://codereview.adblockplus.org/29907586/diff/29923588/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29907586/diff/29923588/lib/filterClasses.j... lib/filterClasses.js:816: return this.contentType == RegExpFilter.prototype.contentType && On 2018/10/24 21:35:28, Manish Jethani wrote: > I haven't run this code, but I'm pretty sure that this would cause the initial > memory footprint to go up, because we would end up creating the domain maps for > all the blocking/whitelist filters. > > We could change the check to this: > > return this.contentType == RegExpFilter.prototype.contentType && > this.thirdParty == null && !this.domainSource && !this.sitekeySource && > !this.domains && !this.sitekeys; > > For filters that do have domains, it would short-circuit at `!this.domainSource` > and thus avoid the creation of the domain map. If the domain map has already > been created, it would fail at `!this.domains`. For filters that do not have > domains, both tests would pass. Ah, I see what you mean there. I didn't look deep enough in the docs/source to properly see or understand domainSource and sitekeySource https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:76: * Lookup table for location only filters by their associated keyword On 2018/10/24 21:35:28, Manish Jethani wrote: > Nit: I don't know if this is right, but I would have spelt it "location-only" > (with a hyphen). Acknowledged. https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:78: * @private On 2018/10/24 21:35:28, Manish Jethani wrote: > Let's put @private last. Acknowledged. https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:81: this._fastFilterByKeyword = new Map(); On 2018/10/24 21:35:28, Manish Jethani wrote: > I'm thinking about the nomenclature here, about how to distinguish the two > filter types. What do you think about "simple" filters and "complex" filters? I like this, it makes more sense from a reader's perspective IMHO
https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:174: let count = typeof filters != "undefined" ? filters.size : 0; We need `count` to be the sum of `this._simpleFiltersByKeyword(candidate).size` and `this._complexFiltersByKeyword(candidate).size` to get the most optimal results here (basically keep it semantically equivalent to the current version of this function). https://codereview.adblockplus.org/29907586/diff/29923588/lib/matcher.js#newc... lib/matcher.js:202: if (typeof set == "undefined") This is clearly wrong. We want to check all filters that match the keyword. Here the code only checks the simple filters if there are no complex filters. We have to do the simple filters first, so let's do this: let set = this._simpleFiltersByKeyword(keyword); if (set) { for (let filter of set) { if (specificOnly && !(filter instanceof WhitelistFilter)) continue; if (filter.matchesLocation(location)) return filter; } } set = this._complexFiltersByKeyword(keyword); ... Note that we don't have to call `filter.isGeneric()` because by definition the simple filters have neither any domains nor any sitekeys so they are generic.
Closing this. |