|
|
Created:
Jan. 22, 2015, 1 p.m. by Thomas Greiner Modified:
June 16, 2016, 11:58 a.m. Reviewers:
Wladimir Palant Visibility:
Public. |
DescriptionAccording to my preliminary tests this change increases the performance of the `getSelectorsForDomain()` function by 30-60% (depending on the installed filters and the domains they are applied to).
Patch Set 1 #Patch Set 2 : Optimized performance of data structure removal #
Total comments: 24
Patch Set 3 : Fixed domain logic and addressed comments #
Total comments: 10
Patch Set 4 : Rebased to 5434ef6e1545 #Patch Set 5 : Couple of overall improvements #
Total comments: 10
Patch Set 6 : Always consider generic filters #
Total comments: 1
MessagesTotal messages: 11
http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... File lib/elemHide.js (right): http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:141: if (!("nsIStyleSheetService" in Ci)) Looking up things in Components.interfaces isn't such a fast operation in Firefox, doing that for 15k filters will take up 15ms on my computer. How about storing that value in a global boolean flag? http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:147: domains[""] = undefined; This should be true, not undefined. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:158: filtersByDomain[domain][filter.text] = filter.isActiveOnDomain(domain); The very point of this functionality is to avoid calling isActiveOnDomain() too much - it's a slow operation. Why not |filtersByDomain[domain][filter.text] = domains[domain]|? http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:159: filtersByDomain[domain]._length++; Strictly speaking, _length isn't an invalid domain name. How about using something like " length" for this property name? A collision with a domain name should be impossible, not merely unlikely. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:202: domains[""] = undefined; This should be true, not undefined. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:425: * used only in Chrome, Opera and Safari). This comment is no longer correct. Either way, it should say now that this function cannot be used in Firefox, not merely that it isn't being used. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:428: { How about an exception here on Firefox, to explain that this function cannot be used there? http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:446: if (filter.isActiveOnDomain(docDomain) && !this.getException(selector, docDomain)) Again, the point here is to avoid calling isActiveOnDomain(). http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:458: if (filterTexts) This check seems pointless - if a property exists then its value can only be an object, you make sure of that. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:464: let filter = filterByKey[keyByFilter[filterText]]; let filter = Filter.fromText(filterText); http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:470: selectors[selectors.length] = selector; You meant using selectors.push(), right? http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:480: } This logic is wrong, we can have a filter ~subdomain.example.com,example.com##foo. Then filtersByDomain["subdomain.example.com"][..] will be false, filtersByDomain["example.com"][..] will be true on the other hand. Your code currently looks only at the "true" values and will happily apply that filter on subdomain.example.com as well. If a filter was "blacklisted" at a subdomain level it should be ignored for a higher level domain. This means that you will need looking up the value for a particular filter, so you need another object: let result = Object.create(null); ... if (!(filterText in result)) result[filterText] = filterTexts[filterText]; ... return Object.keys(result) .map(Filter.fromText) .filter(f => !this.getException(filter.selector, docDomain)) .map(f => f.selector); Note that I'm not actually suggesting using that Array.map and Array.filter (these are slow), merely explaining what I mean. Note also handling specificOnly flag should be possible inside the loop - it's merely about whether to process "" after all the other domains.
http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... File lib/elemHide.js (right): http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:141: if (!("nsIStyleSheetService" in Ci)) On 2015/02/23 19:20:16, Wladimir Palant wrote: > Looking up things in Components.interfaces isn't such a fast operation in > Firefox, doing that for 15k filters will take up 15ms on my computer. How about > storing that value in a global boolean flag? Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:147: domains[""] = undefined; On 2015/02/23 19:20:16, Wladimir Palant wrote: > This should be true, not undefined. Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:158: filtersByDomain[domain][filter.text] = filter.isActiveOnDomain(domain); On 2015/02/23 19:20:16, Wladimir Palant wrote: > The very point of this functionality is to avoid calling isActiveOnDomain() too > much - it's a slow operation. Why not |filtersByDomain[domain][filter.text] = > domains[domain]|? Done, you're right that it appears to be redundant here. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:159: filtersByDomain[domain]._length++; On 2015/02/23 19:20:16, Wladimir Palant wrote: > Strictly speaking, _length isn't an invalid domain name. How about using > something like " length" for this property name? A collision with a domain name > should be impossible, not merely unlikely. That's true but that might make it difficult to spot the difference between the names " length" and "length" and therefore could cause issues in the future. Due to that I chose "$length" instead. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:202: domains[""] = undefined; On 2015/02/23 19:20:16, Wladimir Palant wrote: > This should be true, not undefined. Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:425: * used only in Chrome, Opera and Safari). On 2015/02/23 19:20:16, Wladimir Palant wrote: > This comment is no longer correct. Either way, it should say now that this > function cannot be used in Firefox, not merely that it isn't being used. Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:428: { On 2015/02/23 19:20:16, Wladimir Palant wrote: > How about an exception here on Firefox, to explain that this function cannot be > used there? Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:446: if (filter.isActiveOnDomain(docDomain) && !this.getException(selector, docDomain)) On 2015/02/23 19:20:16, Wladimir Palant wrote: > Again, the point here is to avoid calling isActiveOnDomain(). Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:458: if (filterTexts) On 2015/02/23 19:20:16, Wladimir Palant wrote: > This check seems pointless - if a property exists then its value can only be an > object, you make sure of that. Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:464: let filter = filterByKey[keyByFilter[filterText]]; On 2015/02/23 19:20:16, Wladimir Palant wrote: > let filter = Filter.fromText(filterText); Done. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:470: selectors[selectors.length] = selector; On 2015/02/23 19:20:16, Wladimir Palant wrote: > You meant using selectors.push(), right? This turned out to be about ~3ms faster for each call of getSelectorsForDomain() on my machine. That amounts to an additional ~50ms of total delay of page load time that I measured on bild.de. Therefore I chose to go with this instead of `Array.push()`. http://codereview.adblockplus.org/4529242486341632/diff/5741031244955648/lib/... lib/elemHide.js:480: } On 2015/02/23 19:20:16, Wladimir Palant wrote: > This logic is wrong, we can have a filter > ~subdomain.example.com,example.com##foo. Then > filtersByDomain["subdomain.example.com"][..] will be false, > filtersByDomain["example.com"][..] will be true on the other hand. Your code > currently looks only at the "true" values and will happily apply that filter on > http://subdomain.example.com as well. > > If a filter was "blacklisted" at a subdomain level it should be ignored for a > higher level domain. This means that you will need looking up the value for a > particular filter, so you need another object: > > let result = Object.create(null); > ... > if (!(filterText in result)) > result[filterText] = filterTexts[filterText]; > ... > return Object.keys(result) > .map(Filter.fromText) > .filter(f => !this.getException(filter.selector, docDomain)) > .map(f => f.selector); > > Note that I'm not actually suggesting using that Array.map and Array.filter > (these are slow), merely explaining what I mean. > > Note also handling specificOnly flag should be possible inside the loop - it's > merely about whether to process "" after all the other domains. Indeed, that's true. I added the `ignorableFilters` array to track filters that should be ignored for higher-level domains. FYI: I've also implemented your suggested approach and on average it turned out to be even slower than the existing approach (e.g. ~10ms on bild.de). I created a BitBucket repo to showcase both approaches (including the code for checking the performance): - greiner: https://bitbucket.org/greiner12/elemhide/commits/3b2da2848b3781b954bad806187d... - trev: https://bitbucket.org/greiner12/elemhide/commits/b3a6ca182480c03b2bf4e1b3054c...
http://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib/... File lib/elemHide.js (right): http://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib/... lib/elemHide.js:456: } This block duplicates the logic below unnecessarily. It also happens to produce wrong results. For example, ~example.com##foo should be active on every domain but example.com. However, with domain being foo.example.com your code will currently add it to the selectors in the block above and ignore the exception completely. "" should really be handled in the loop below along with other "domains": while (true) { if (domain in filtersByDomain && (domain != "" || !specificOnly)) ... if (domain == "") break; let nextDot = domain.indexOf("."); domain = domain.substr(nextDot + 1); } http://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib/... lib/elemHide.js:458: let ignorableFilters = []; Use an object here rather than an array, given that you need to do frequent lookups? Also, it's ignoredFilters I think. http://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib/... lib/elemHide.js:466: if (ignorableFilters.indexOf(filterText) > -1) How about checking for $length immediately? if (filterText == "$length" ||Â filterText in ignoredFilters) continue; http://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib/... lib/elemHide.js:475: if (!this.getException(filter, domain)) Calling Filter.fromText() here looks like unnecessary overhead - you don't really need the full filter. How about storing the selectors in the data rather than the complete filter text? ElemHide.getException() can be changed to accept selectors rather than filters, that's what it needs anyway. The only other caller of ElemHide.getException() is contentPolicy.js and easily changed. The only problem: multiple filters could use the same selector for the same domain (consider foo.example.com##foo and bar.example.org,foo.example.com##foo), meaning that your current approach to remove filters from the data will no longer work. We need to store the count for each selector rather than for domains. http://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib/... lib/elemHide.js:476: selectors[selectors.length] = filter.selector; selectors.push() please. Either way, I think this might add the same selector multiple times. Consider foo.example.com,example.com##foo on foo.example.com.
https://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib... lib/elemHide.js:456: } On 2015/06/05 21:58:54, Wladimir Palant wrote: > This block duplicates the logic below unnecessarily. It also happens to produce > wrong results. For example, ~example.com##foo should be active on every domain > but http://example.com. However, with domain being http://foo.example.com your code will > currently add it to the selectors in the block above and ignore the exception > completely. > > "" should really be handled in the loop below along with other "domains": > > while (true) > { > if (domain in filtersByDomain && (domain != "" || !specificOnly)) > ... > > if (domain == "") > break; > > let nextDot = domain.indexOf("."); > domain = domain.substr(nextDot + 1); > } Done. https://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib... lib/elemHide.js:458: let ignorableFilters = []; On 2015/06/05 21:58:54, Wladimir Palant wrote: > Use an object here rather than an array, given that you need to do frequent > lookups? > > Also, it's ignoredFilters I think. Done. https://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib... lib/elemHide.js:466: if (ignorableFilters.indexOf(filterText) > -1) On 2015/06/05 21:58:54, Wladimir Palant wrote: > How about checking for $length immediately? > > if (filterText == "$length" || filterText in ignoredFilters) > continue; Done. https://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib... lib/elemHide.js:475: if (!this.getException(filter, domain)) On 2015/06/05 21:58:54, Wladimir Palant wrote: > Calling Filter.fromText() here looks like unnecessary overhead - you don't > really need the full filter. How about storing the selectors in the data rather > than the complete filter text? ElemHide.getException() can be changed to accept > selectors rather than filters, that's what it needs anyway. The only other > caller of ElemHide.getException() is contentPolicy.js and easily changed. > > The only problem: multiple filters could use the same selector for the same > domain (consider foo.example.com##foo and bar.example.org,foo.example.com##foo), > meaning that your current approach to remove filters from the data will no > longer work. We need to store the count for each selector rather than for > domains. Done. That's an awesome tip, thanks! Note that the `#foo` selector is currently being returned twice when using the two filters you mentioned. This no longer occurs with these changes. https://codereview.adblockplus.org/4529242486341632/diff/5676830073815040/lib... lib/elemHide.js:476: selectors[selectors.length] = filter.selector; On 2015/06/05 21:58:54, Wladimir Palant wrote: > selectors.push() please. Done. > Either way, I think this might add the same selector multiple times. Consider > foo.example.com,example.com##foo on http://foo.example.com. Done. Unfortunately, `Array.indexOf()` is terribly slow to check this so I chose to use the `ignoredSelectors` object for this. https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:172: selectors[filter.selector].isActive |= domains[domain]; I made this change here because I noticed that this would be set to `false` for domain "" if the following filters were added in that order: ###foo example.com###foo https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:352: _generateCSSContent: function*() This was added when rebasing
https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:62: * Lookup table, blocking filter selectors by domain which they are applied to Nit: the word "blocking" makes no sense here. https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:172: selectors[filter.selector].isActive |= domains[domain]; On 2015/06/23 13:28:41, Thomas Greiner wrote: > I made this change here because I noticed that this would be set to `false` for > domain "" if the following filters were added in that order: > > ###foo > example.com###foo Unfortunately, the result still isn't correct. Consider the following: ###foo ~example.com###foo For selector "#foo" we get isActive: true on "" and isActive: false on "example.com". As a result, it won't apply on "example.com" even though it should. I think that I gave you a bad advise here. One would somehow need to combine the domain lists for all filters having the same selector, and I am having doubts that this is possible in a manageable way.
https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:172: selectors[filter.selector].isActive |= domains[domain]; On 2015/06/23 13:57:09, Wladimir Palant wrote: > On 2015/06/23 13:28:41, Thomas Greiner wrote: > > I made this change here because I noticed that this would be set to `false` > for > > domain "" if the following filters were added in that order: > > > > ###foo > > example.com###foo > > Unfortunately, the result still isn't correct. Consider the following: > > ###foo > ~example.com###foo > > For selector "#foo" we get isActive: true on "" and isActive: false on > "example.com". As a result, it won't apply on "example.com" even though it > should. > > I think that I gave you a bad advise here. One would somehow need to combine the > domain lists for all filters having the same selector, and I am having doubts > that this is possible in a manageable way. I see your point and please correct me if I'm wrong but wouldn't it make more sense to prioritize the specific filter over the generic one in this case? Because what I don't understand is why a generic filter should hide an element if another filter explicitly declares that that element shouldn't be hidden on that domain. So to illustrate what I mean… Currently: ###foo + ~example.com###foo == ###foo Suggested: ###foo + ~example.com###foo == ###foo + example.com#@##foo Admittingly, this behavior would be inconsistent with request blocking filters but element hiding filters can only activate or deactivate specific selectors anyway so I wouldn't see this as an issue.
https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:172: selectors[filter.selector].isActive |= domains[domain]; The problem is: you would create a behavior here that is highly confusing for filter list authors. If you add a filter like ###foo you expect it to work. Now suddenly it depends on other filters with the same selector, probably in other filter lists. And if you implement a minimal change to the selector this behavior will suddenly change. Never mind inconsistency to blocking rules (foo + foo$domain=~example.com). Granted, it's already the same with exception rules - we've implemented fairly complicated interactions there. However, exception rules are explicit overrides, and are typically found in only one filter list: Acceptable Ads.
https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:172: selectors[filter.selector].isActive |= domains[domain]; Generally, I might be able to solve this by splitting up "isActive" into "isActiveGeneric" and "isActiveSpecific" to prioritize generic filters over specific ones. Still, I think it's worth discussing so feel free to move this discussion elsewhere (e.g. our public forum). On 2015/06/29 09:09:09, Wladimir Palant wrote: > Granted, it's already the same with exception rules - we've implemented fairly > complicated interactions there. However, exception rules are explicit overrides, That's correct, `~example.com###foo` is not an exception rule but it's still an explicit intent to not apply a selector on example.com. > The problem is: you would create a behavior here that is highly confusing for > filter list authors. If you add a filter like ###foo you expect it to work. Now > suddenly it depends on other filters with the same selector, probably in other > filter lists. And if you implement a minimal change to the selector this > behavior will suddenly change. The underlying issue is that because `~example.com` is not treated as an exception rule, we are being forced to work around that inconsistency now. So I'd suggest that if we want to remove that inconsistency without confusing filter list authors we should clarify that in the blockable items list. Note that I couldn't find any conflicts in EasyList+EasyListGermany that would result from this change in handling `~example.com`. Asking Arthur about that, he said that they usually avoid writing `example.com,~www.example.com###foo`. Instead they write `example.com###foo` and `www.example.com#@##foo` as separate filters. Therefore another option would be to get rid of the ability to have inverted domains for element hiding filters. However, I'm not suggesting this for now to avoid breaking backwards-compatibility. In any case, filters of the form `~example.com###foo` are already and will continue to be confusing.
https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:62: * Lookup table, blocking filter selectors by domain which they are applied to On 2015/06/23 13:57:09, Wladimir Palant wrote: > Nit: the word "blocking" makes no sense here. Done. https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:172: selectors[filter.selector].isActive |= domains[domain]; On 2015/06/23 13:57:09, Wladimir Palant wrote: > On 2015/06/23 13:28:41, Thomas Greiner wrote: > > I made this change here because I noticed that this would be set to `false` > for > > domain "" if the following filters were added in that order: > > > > ###foo > > example.com###foo > > Unfortunately, the result still isn't correct. Consider the following: > > ###foo > ~example.com###foo > > For selector "#foo" we get isActive: true on "" and isActive: false on > "example.com". As a result, it won't apply on "example.com" even though it > should. > > I think that I gave you a bad advise here. One would somehow need to combine the > domain lists for all filters having the same selector, and I am having doubts > that this is possible in a manageable way. Done. See comment below for further details. https://codereview.adblockplus.org/4529242486341632/diff/29321033/lib/elemHid... lib/elemHide.js:474: ignoredSelectors[filterSelector] = null; Basically, all I did was remove the else-case here. All the other changes are simplifications. Due to that change, however, both "selectors" and "ignoredFilters" would've ended up containing the exact same values which is why I decided to change "selectors" from an array to a plain object. Actually, I'd prefer using `Set` for that because it has the best performance but that would've required changing the return type of "getSelectorsForDomain".
https://codereview.adblockplus.org/4529242486341632/diff/29321269/lib/elemHid... File lib/elemHide.js (right): https://codereview.adblockplus.org/4529242486341632/diff/29321269/lib/elemHid... lib/elemHide.js:464: continue; Even with the inconsistent logic changes you are implementing, this is still wrong. !domainSelectors[filterSelector].isActive shouldn't mean "oh well, maybe some other domain is allowed" - it's really "oops, should not apply, stop now." Meaning that ~foo.example.com,example.com##foo really shouldn't apply on foo.example.com even if that filter is allowed on example.com. Same goes for exceptions, if we have foo.example.com#@#foo then example.com##foo should *not* apply on foo.example.com. I recommend adding unit tests for all the scenarios I mention in the review comments. I feel like you are consistently forgetting to test the edge cases. |