|
|
Created:
Feb. 4, 2019, 6:29 p.m. by Manish Jethani Modified:
March 20, 2019, 8:27 a.m. CC:
kzar Base URL:
https://hg.adblockplus.org/adblockpluscore/ Visibility:
Public. |
Patch Set 1 : #
Total comments: 10
MessagesTotal messages: 11
Patch Set 1 As discussed, this patch tries to internalize the third-party request check.
On 2019/02/05 04:12:08, Manish Jethani wrote: > Patch Set 1 > > As discussed, this patch tries to internalize the third-party request check. adblockpluschrome changes: https://codereview.adblockplus.org/29998582/ I will file tickets.
https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js File lib/domain.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js#newco... lib/domain.js:117: * @param {URL|string} url The request URL. Do we even still need to support the case where an URL object is passed in? https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js#newc... lib/matcher.js:395: let thirdParty = docDomain && isThirdParty(location, docDomain); As discussed on IRC, how about only calling isThirdParty() if we found a filter with a non-null `thirdParty` property?
https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js File lib/domain.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js#newco... lib/domain.js:117: * @param {URL|string} url The request URL. On 2019/02/05 04:32:53, Sebastian Noack wrote: > Do we even still need to support the case where an URL object is passed in? Actually the default is a URL object now, which is created in lib/requestBlocker.js in adblockpluschrome. It makes it all the way to here. https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js#newc... lib/matcher.js:395: let thirdParty = docDomain && isThirdParty(location, docDomain); On 2019/02/05 04:32:53, Sebastian Noack wrote: > As discussed on IRC, how about only calling isThirdParty() if we found a filter > with a non-null `thirdParty` property? I tried this but it actually seemed to be more expensive. This makes sense: only about 5% of all requests are blocked. I have not analyzed why it was more expensive (maybe there is a single filter in EasyList with the flag set that applies to most requests?), and maybe we can look into it later. It does seem like a good idea on the surface. Since more than 80% of the requests are going to be in the 10,000-entry cache now, and since isThirdParty is super optimized anyway, it's not worth much trying avoid calling it.
https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js File lib/domain.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js#newco... lib/domain.js:117: * @param {URL|string} url The request URL. On 2019/02/05 05:07:28, Manish Jethani wrote: > On 2019/02/05 04:32:53, Sebastian Noack wrote: > > Do we even still need to support the case where an URL object is passed in? > > Actually the default is a URL object now, which is created in > lib/requestBlocker.js in adblockpluschrome. It makes it all the way to here. So why did you add support for passing in strings then? https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js#newc... lib/matcher.js:395: let thirdParty = docDomain && isThirdParty(location, docDomain); On 2019/02/05 05:07:28, Manish Jethani wrote: > On 2019/02/05 04:32:53, Sebastian Noack wrote: > > As discussed on IRC, how about only calling isThirdParty() if we found a > filter > > with a non-null `thirdParty` property? > > I tried this but it actually seemed to be more expensive. This makes sense: only > about 5% of all requests are blocked. I have not analyzed why it was more > expensive (maybe there is a single filter in EasyList with the flag set that > applies to most requests?), and maybe we can look into it later. It does seem > like a good idea on the surface. Sure, if you just call isThridParty() as we perform third-party checks currently, this might cause isThirdParty() being called several times while matching a single URL: matches(location, typeMask, docDomain, thirdParty, sitekey) { return (this.contentType & typeMask) != 0 && (this.thirdParty == null || this.thirdParty == thirdParty) && this.isActiveOnDomain(docDomain, sitekey) && this.matchesLocation(location); } Did you try moving the third party check to the bottom? > Since more than 80% of the requests are going to be in the 10,000-entry cache > now, and since isThirdParty is super optimized anyway, it's not worth much > trying avoid calling it. I'm not too happy about this change, and while it might help to cheat your benchmark, I wonder how great an improvement this is in practice. IMO, a proper LRU cache would be rather in order, as I commented on the issue, or even better improving the matching as much that caching becomes redundant.
Filed a ticket: https://issues.adblockplus.org/ticket/7260
https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js File lib/domain.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js#newco... lib/domain.js:117: * @param {URL|string} url The request URL. On 2019/02/05 05:21:16, Sebastian Noack wrote: > On 2019/02/05 05:07:28, Manish Jethani wrote: > > On 2019/02/05 04:32:53, Sebastian Noack wrote: > > > Do we even still need to support the case where an URL object is passed in? > > > > Actually the default is a URL object now, which is created in > > lib/requestBlocker.js in adblockpluschrome. It makes it all the way to here. > > So why did you add support for passing in strings then? It's there for backwards compatibility with (1) libadblockplus, (2) unit tests, (3) lib/popUpBlocker.js in adblockpluschrome. Also I'd like to try to avoid trying to create a new URL object if possible (hostname can be extracted using a regular expression), but that would be a separate issue. https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js#newc... lib/matcher.js:395: let thirdParty = docDomain && isThirdParty(location, docDomain); On 2019/02/05 05:21:16, Sebastian Noack wrote: > On 2019/02/05 05:07:28, Manish Jethani wrote: > > On 2019/02/05 04:32:53, Sebastian Noack wrote: > > > As discussed on IRC, how about only calling isThirdParty() if we found a > > filter > > > with a non-null `thirdParty` property? > > > > I tried this but it actually seemed to be more expensive. This makes sense: > only > > about 5% of all requests are blocked. I have not analyzed why it was more > > expensive (maybe there is a single filter in EasyList with the flag set that > > applies to most requests?), and maybe we can look into it later. It does seem > > like a good idea on the surface. > > Sure, if you just call isThridParty() as we perform third-party checks > currently, this might cause isThirdParty() being called several times while > matching a single URL: > > matches(location, typeMask, docDomain, thirdParty, sitekey) > { > return (this.contentType & typeMask) != 0 && > (this.thirdParty == null || this.thirdParty == thirdParty) && > this.isActiveOnDomain(docDomain, sitekey) && > this.matchesLocation(location); > } > > Did you try moving the third party check to the bottom? Yes, I know what you mean. I put the isThirdParty call in checkEntryMatch, which means it would get called for every keyword in the URL. Anyway, this is a lot of work to find out how to minimize the calls to isThirdParty. First we need to look at the filter list data to see if it makes sense. Then we need to think about where to position that call. Some refactoring may be involved. I really don't want to go there right now, at least for this release. > > Since more than 80% of the requests are going to be in the 10,000-entry cache > > now, and since isThirdParty is super optimized anyway, it's not worth much > > trying avoid calling it. > > I'm not too happy about this change, and while it might help to cheat your > benchmark, I wonder how great an improvement this is in practice. IMO, a proper > LRU cache would be rather in order, as I commented on the issue, or even better > improving the matching as much that caching becomes redundant. The purpose of this change should be not to optimize but rather to move the calls into Core now so they can be optimized in whatever way later. Leaving aside performance (let's say there is zero improvement here), would you still rather leave the check for third-party request status in adblockpluschrome? This is of course also a question for libadblockplus and all of its clients. The more things stay in Core the better it is for everyone.
https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js File lib/domain.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/domain.js#newco... lib/domain.js:117: * @param {URL|string} url The request URL. On 2019/02/05 05:42:23, Manish Jethani wrote: > On 2019/02/05 05:21:16, Sebastian Noack wrote: > > On 2019/02/05 05:07:28, Manish Jethani wrote: > > > On 2019/02/05 04:32:53, Sebastian Noack wrote: > > > > Do we even still need to support the case where an URL object is passed > in? > > > > > > Actually the default is a URL object now, which is created in > > > lib/requestBlocker.js in adblockpluschrome. It makes it all the way to here. > > > > So why did you add support for passing in strings then? > > It's there for backwards compatibility with (1) libadblockplus, (2) unit tests, > (3) lib/popUpBlocker.js in adblockpluschrome. > > Also I'd like to try to avoid trying to create a new URL object if possible > (hostname can be extracted using a regular expression), but that would be a > separate issue. I don't get it, you can just make the unit tsts create an URL object, I don't see any code left in adblockpluschrome (after your related changes there) that would still call isThirdParty(), and libadblockplus never used a version of adblockpluscore that provided an isThirdParty() function, and if it did the code needs to be changed anyway since the matcher no longer let's you pass in a third-party value. https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js File lib/matcher.js (right): https://codereview.adblockplus.org/29998564/diff/29998578/lib/matcher.js#newc... lib/matcher.js:395: let thirdParty = docDomain && isThirdParty(location, docDomain); On 2019/02/05 05:42:23, Manish Jethani wrote: > On 2019/02/05 05:21:16, Sebastian Noack wrote: > > On 2019/02/05 05:07:28, Manish Jethani wrote: > > > On 2019/02/05 04:32:53, Sebastian Noack wrote: > > > > As discussed on IRC, how about only calling isThirdParty() if we found a > > > filter > > > > with a non-null `thirdParty` property? > > > > > > I tried this but it actually seemed to be more expensive. This makes sense: > > only > > > about 5% of all requests are blocked. I have not analyzed why it was more > > > expensive (maybe there is a single filter in EasyList with the flag set that > > > applies to most requests?), and maybe we can look into it later. It does > seem > > > like a good idea on the surface. > > > > Sure, if you just call isThridParty() as we perform third-party checks > > currently, this might cause isThirdParty() being called several times while > > matching a single URL: > > > > matches(location, typeMask, docDomain, thirdParty, sitekey) > > { > > return (this.contentType & typeMask) != 0 && > > (this.thirdParty == null || this.thirdParty == thirdParty) && > > this.isActiveOnDomain(docDomain, sitekey) && > > this.matchesLocation(location); > > } > > > > Did you try moving the third party check to the bottom? > > Yes, I know what you mean. I put the isThirdParty call in checkEntryMatch, which > means it would get called for every keyword in the URL. > > Anyway, this is a lot of work to find out how to minimize the calls to > isThirdParty. First we need to look at the filter list data to see if it makes > sense. Then we need to think about where to position that call. Some refactoring > may be involved. I really don't want to go there right now, at least for this > release. > > > > Since more than 80% of the requests are going to be in the 10,000-entry > cache > > > now, and since isThirdParty is super optimized anyway, it's not worth much > > > trying avoid calling it. > > > > I'm not too happy about this change, and while it might help to cheat your > > benchmark, I wonder how great an improvement this is in practice. IMO, a > proper > > LRU cache would be rather in order, as I commented on the issue, or even > better > > improving the matching as much that caching becomes redundant. > > The purpose of this change should be not to optimize but rather to move the > calls into Core now so they can be optimized in whatever way later. > > Leaving aside performance (let's say there is zero improvement here), would you > still rather leave the check for third-party request status in > adblockpluschrome? > > This is of course also a question for libadblockplus and all of its clients. The > more things stay in Core the better it is for everyone. Fair enough, for not further optimizing this here. But I'd still want to revisit https://issues.adblockplus.org/ticket/7254#comment:11 before updating the dependency for the release, this however, is unrelated of the changes here.
I just did some tests and I'm not seeing any significant performance improvement. I think we should put this on hold for now. It's a good idea, but it can wait until after ABP 3.5.
Since there have been significant changes to adblockpluscore by now, I have reimplemented this and submitted it for review on GitLab: https://gitlab.com/eyeo/adblockplus/adblockpluscore/merge_requests/36 I am closing this now. |