|
|
Created:
April 28, 2018, 4:15 p.m. by Manish Jethani Modified:
May 11, 2018, 10:42 a.m. Reviewers:
kzar CC:
sergei Base URL:
https://hg.adblockplus.org/adblockpluscore/ Visibility:
Public. |
DescriptionThis builds on https://codereview.adblockplus.org/29757591/
Patch Set 1 #
Total comments: 5
Patch Set 2 : Rebase and minor changes #MessagesTotal messages: 7
Patch Set 1 https://codereview.adblockplus.org/29764555/diff/29764556/lib/elemHide.js File lib/elemHide.js (right): https://codereview.adblockplus.org/29764555/diff/29764556/lib/elemHide.js#new... lib/elemHide.js:93: let setIndex = +isIncluded; Conveniently boolean converted to number gives us the index of the set we're looking for. https://codereview.adblockplus.org/29764555/diff/29764556/lib/elemHide.js#new... lib/elemHide.js:94: if (!filters[setIndex]) The number of entries is the same as before. If there are only inclusions or only exclusions then there is only one Set object. https://codereview.adblockplus.org/29764555/diff/29764556/lib/elemHide.js#new... lib/elemHide.js:258: let seenFilters = []; This is now an array of sets. The maximum number of entries in this array will be the number of domain parts. e.g. for cool.example.com there will be at most 3 entries. An entry is added only if there are exclusions for the corresponding domain part.
https://codereview.adblockplus.org/29764555/diff/29764556/lib/elemHide.js File lib/elemHide.js (right): https://codereview.adblockplus.org/29764555/diff/29764556/lib/elemHide.js#new... lib/elemHide.js:271: for (let set of seenFilters) By the way, in case it isn't clear, the reason this performs better is that we're not constructing a new set like before, instead we're adding previously constructed sets to an array. Lookup in this array of sets might be slower in theory, but typically for a domain like "foo.example.com" there's only at most one set in the array anyway, so there's only one lookup. For a domain that has no exception subdomains (which is most domains actually), there will be no lookups either. https://codereview.adblockplus.org/29764555/diff/29764556/lib/elemHide.js#new... lib/elemHide.js:272: { Note that as a side effect of this change a filter like "foo.example.com,example.com##bar" will appear twice (hence the selector "bar" will be added twice to the result). This is unfortunate but I don't think it is worth doing anything about, because in practice I'm not sure why a filter author would write a filter like that. If there are two filters "foo.example.com##bar" and "example.com##bar", which is more likely to happen in practice anyway (the author did not check that there's already a filter that applies), we don't filter out duplicates in such a case; then it doesn't makes that we should care about duplicates at all. It does mean unfortunately that this.getException would get called twice for the same filter, but I don't think this matters in practice.
This is certainly an interesting idea, but I'd like to see some numbers. I wonder: 1. How adding + removing filters is affected, e.g. the speed of EasyList being added then removed 5 times in a row before and after the change. 2. How fast getSelectorsForDomain is before and after the change, e.g. with that simplistic snippet we were using before that runs the function a few thousand times for the same domain. 3. How often getSelectorsForDomain returns a different result from before for EasyList. In other words how many times a selector for a filter ends up applying to a domain multiple times after the change when it didn't previously. I think with those numbers the decision about if this should be included will be pretty clear cut.
On 2018/05/02 12:41:49, kzar wrote: > This is certainly an interesting idea, but I'd like to see some numbers. I > wonder: > > 1. How adding + removing filters is affected, e.g. the speed of EasyList being > added then removed 5 times in a row before and after the change. > 2. How fast getSelectorsForDomain is before and after the change, e.g. with that > simplistic snippet we were using before that runs the function a few thousand > times for the same domain. > 3. How often getSelectorsForDomain returns a different result from before for > EasyList. In other words how many times a selector for a filter ends up applying > to a domain multiple times after the change when it didn't previously. > > I think with those numbers the decision about if this should be included will be > pretty clear cut. OK, let's see what happens with patch #29757591. It's already looking very good so this one may not make as much of a difference as it seemed to initially. I'll measure against the final patch that lands there.
On 2018/05/02 12:41:49, kzar wrote: > 3. How often getSelectorsForDomain returns a different result from before for > EasyList. In other words how many times a selector for a filter ends up applying > to a domain multiple times after the change when it didn't previously. Scratch this one, it doesn't make any difference now. > 1. How adding + removing filters is affected, e.g. the speed of EasyList being > added then removed 5 times in a row before and after the change. > 2. How fast getSelectorsForDomain is before and after the change, e.g. with that > simplistic snippet we were using before that runs the function a few thousand > times for the same domain. But these two are still interesting. > OK, let's see what happens with patch #29757591. It's already looking very good > so this one may not make as much of a difference as it seemed to initially. I'll > measure against the final patch that lands there. Sounds good, that's landed now.
I'm closing this because it only improves things by ~5-10% whereas as we have much bigger improvements to make [1]. I might revisit this once all the other changes have landed. [1]: https://codereview.adblockplus.org/29773570/ |