|
|
Created:
April 24, 2018, 8:31 p.m. by hub Modified:
May 17, 2018, 1:47 p.m. Base URL:
https://hg.adblockplus.org/adblockpluscore/ Visibility:
Public. |
DescriptionIssue 6592 - Implement $rewrite filter option
Patch Set 1 #Patch Set 2 : Added test. The rewrite is now part of BlockingFilter #
Total comments: 11
Patch Set 3 : Use the replace syntax directly #
Total comments: 5
Patch Set 4 : Check origin on rewrite. #
Total comments: 10
Patch Set 5 : More review changes #
Total comments: 2
Patch Set 6 : Update doc comment. #
Total comments: 21
Patch Set 7 : Some comment adjustments. #Patch Set 8 : Relative URL rewrite. #
Total comments: 6
Patch Set 9 : Comment fixes. #
Total comments: 21
Patch Set 10 : Some minor adjustments (comments, formatting) #Patch Set 11 : Properly polyfill URL for NodeJS #
Total comments: 5
Patch Set 12 : Use typeof to check for undefined #Patch Set 13 : Just inject URL into the sandbox globals. #
Total comments: 2
MessagesTotal messages: 62
I need to add some tests.
https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:948: return this.rewrite.replace("$1", matches[0]); This is simplistic though. It does work, but the idea would be that $1, $2, etc are the matches from the RegExp. But I couldn't get a filter with a RegExp to work properly, which reduce versatility of the filter.
https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:948: return this.rewrite.replace("$1", matches[0]); On 2018/04/27 00:48:22, hub wrote: > This is simplistic though. It does work, but the idea would be that $1, $2, etc > are the matches from the RegExp. But I couldn't get a filter with a RegExp to > work properly, which reduce versatility of the filter. What's wrong with the following? doRewrite(urlString) { return urlString.replace(this.regexp, this.rewrite); } So a filter like /\b(social\.y5zone\.sg\/ads\/js)\/element\.write\.js\b/$rewrite=$1/foo.js will rewrite http://social.y5zone.sg/ads/js/element.write.js?ts=5778 as http://social.y5zone.sg/ads/js/foo.js?ts=5778 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obje... Also: doRewrite(urlString) { let rewritten = urlString.replace(this.regexp, this.rewrite); if (rewritten != urlString) { try { // Disallow rewriting to a different origin. if (new URL(rewritten).origin != new URL(urlString).origin) rewritten = urlString; } catch (error) { } } return rewritten; }
https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:851: RegExpFilter.typeMap = { We should add REWRITE here. Then we should exclude it below from contentType, similar to CSP.
Updated patch. https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:851: RegExpFilter.typeMap = { On 2018/04/30 20:10:55, Manish Jethani wrote: > We should add REWRITE here. Then we should exclude it below from contentType, > similar to CSP. I don't think it make sense to do so. This is just an option, not a type. https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:948: return this.rewrite.replace("$1", matches[0]); On 2018/04/30 20:07:11, Manish Jethani wrote: > On 2018/04/27 00:48:22, hub wrote: > > This is simplistic though. It does work, but the idea would be that $1, $2, > etc > > are the matches from the RegExp. But I couldn't get a filter with a RegExp to > > work properly, which reduce versatility of the filter. > > What's wrong with the following? > > doRewrite(urlString) > { > return urlString.replace(this.regexp, this.rewrite); > } I thought about this initially but then couldn't make it work as it was always adding what wasn't matching. But I see now how the regexp should be written. Fixed. > Also: > > doRewrite(urlString) > { > let rewritten = urlString.replace(this.regexp, this.rewrite); > > if (rewritten != urlString) > { > try > { > // Disallow rewriting to a different origin. > if (new URL(rewritten).origin != new URL(urlString).origin) > rewritten = urlString; > } > catch (error) > { > } > } > > return rewritten; > } On error we should return the original urlString IMHO (ie perform no rewrite). But then I wasn't sure whether we should enforce this restriction or trust the filters.... What if the user really want to rewrite to a different location with a custom filter?
https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.j... lib/filterClasses.js:946: let matches = this.regexp.exec(urlString); Why do we need to call exec() first? Can't we call replace() right a way? https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.j... lib/filterClasses.js:948: return urlString.replace(this.regexp, this.rewrite); As mentioned in the discussion on the issue, we should make sure (for security reasons) that the URL after the substitution is still in the same origin.
https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:851: RegExpFilter.typeMap = { On 2018/05/01 18:32:15, hub wrote: > On 2018/04/30 20:10:55, Manish Jethani wrote: > > We should add REWRITE here. Then we should exclude it below from contentType, > > similar to CSP. > > I don't think it make sense to do so. This is just an option, not a type. Maybe for the sake of consistency? CSP is also not a content type, just an option. https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:948: return this.rewrite.replace("$1", matches[0]); On 2018/05/01 18:32:15, hub wrote: > On 2018/04/30 20:07:11, Manish Jethani wrote: > > Also: > > > > doRewrite(urlString) > > { > > let rewritten = urlString.replace(this.regexp, this.rewrite); > > > > if (rewritten != urlString) > > { > > try > > { > > // Disallow rewriting to a different origin. > > if (new URL(rewritten).origin != new URL(urlString).origin) > > rewritten = urlString; > > } > > catch (error) > > { > > } > > } > > > > return rewritten; > > } > > On error we should return the original urlString IMHO (ie perform no rewrite). Agreed. > But then I wasn't sure whether we should enforce this restriction or trust the > filters.... What if the user really want to rewrite to a different location with > a custom filter? Maybe we should allow it only for custom filters if we can do it easily (maybe we should file a separate issue for that). In general I think it's a good idea to have this restriction. I'm not so concerned about "malicious" filters, more about a filter accidentally leaking the user's private data to a different origin. Filter authors shouldn't have to worry about this possibility. https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.j... lib/filterClasses.js:946: let matches = this.regexp.exec(urlString); On 2018/05/02 15:10:06, Sebastian Noack wrote: > Why do we need to call exec() first? Can't we call replace() right a way? Yeah exec shouldn't be necessary.
https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:944: doRewrite(urlString) For what it's worth I would rather call this rewriteUrl rather than doRewrite, it's a matter of preference though.
Updated patch https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:851: RegExpFilter.typeMap = { On 2018/05/02 15:31:09, Manish Jethani wrote: > On 2018/05/01 18:32:15, hub wrote: > > On 2018/04/30 20:10:55, Manish Jethani wrote: > > > We should add REWRITE here. Then we should exclude it below from > contentType, > > > similar to CSP. > > > > I don't think it make sense to do so. This is just an option, not a type. > > Maybe for the sake of consistency? CSP is also not a content type, just an > option. I still don't think this is necessary. We don't need to filter rewrite out when matching (unlike CSP). https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:944: doRewrite(urlString) On 2018/05/02 17:52:23, Manish Jethani wrote: > For what it's worth I would rather call this rewriteUrl rather than doRewrite, > it's a matter of preference though. Done. https://codereview.adblockplus.org/29760704/diff/29760750/lib/filterClasses.j... lib/filterClasses.js:948: return this.rewrite.replace("$1", matches[0]); On 2018/05/02 15:31:09, Manish Jethani wrote: > On 2018/05/01 18:32:15, hub wrote: > > On 2018/04/30 20:07:11, Manish Jethani wrote: > > > Also: > > > > > > doRewrite(urlString) > > > { > > > let rewritten = urlString.replace(this.regexp, this.rewrite); > > > > > > if (rewritten != urlString) > > > { > > > try > > > { > > > // Disallow rewriting to a different origin. > > > if (new URL(rewritten).origin != new URL(urlString).origin) > > > rewritten = urlString; > > > } > > > catch (error) > > > { > > > } > > > } > > > > > > return rewritten; > > > } > > > > On error we should return the original urlString IMHO (ie perform no rewrite). > > Agreed. > > > But then I wasn't sure whether we should enforce this restriction or trust the > > filters.... What if the user really want to rewrite to a different location > with > > a custom filter? > > Maybe we should allow it only for custom filters if we can do it easily (maybe > we should file a separate issue for that). In general I think it's a good idea > to have this restriction. I'm not so concerned about "malicious" filters, more > about a filter accidentally leaking the user's private data to a different > origin. Filter authors shouldn't have to worry about this possibility. Done. https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.j... lib/filterClasses.js:946: let matches = this.regexp.exec(urlString); On 2018/05/02 15:31:09, Manish Jethani wrote: > On 2018/05/02 15:10:06, Sebastian Noack wrote: > > Why do we need to call exec() first? Can't we call replace() right a way? > > Yeah exec shouldn't be necessary. Done. https://codereview.adblockplus.org/29760704/diff/29767580/lib/filterClasses.j... lib/filterClasses.js:948: return urlString.replace(this.regexp, this.rewrite); On 2018/05/02 15:10:05, Sebastian Noack wrote: > As mentioned in the discussion on the issue, we should make sure (for security > reasons) that the URL after the substitution is still in the same origin. Ok, I'll check the origin
https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:947: if (rewritten != urlString) Why do we need that check? Seems like a premature optimization.
https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:947: if (rewritten != urlString) On 2018/05/02 22:25:03, Sebastian Noack wrote: > Why do we need that check? Seems like a premature optimization. I consider new URL() costly (and we do it twice).
https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:947: if (rewritten != urlString) On 2018/05/03 00:57:58, hub wrote: > On 2018/05/02 22:25:03, Sebastian Noack wrote: > > Why do we need that check? Seems like a premature optimization. > > I consider new URL() costly (and we do it twice). This code path is only hit for $rewrite filters and only if they match, right?
https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:822: else if (option == "REWRITE" && value) Note that for the rewrite option an empty string should also be a valid value. For example /foo/$rewrite=,domain=example.com should just replace "foo" with "". So I think the check here should be value != null. https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:952: rewritten = urlString; We could return rewritten here and return urlString at the end of the function, then the catch block can be empty.
patch updated https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:822: else if (option == "REWRITE" && value) On 2018/05/03 02:42:36, Manish Jethani wrote: > Note that for the rewrite option an empty string should also be a valid value. > For example /foo/$rewrite=,domain=example.com should just replace "foo" with "". > So I think the check here should be value != null. There can't be an empty string. The `optionsRegExp` will skip the rewrite option because of `=,`. https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:947: if (rewritten != urlString) On 2018/05/03 01:14:52, Sebastian Noack wrote: > On 2018/05/03 00:57:58, hub wrote: > > On 2018/05/02 22:25:03, Sebastian Noack wrote: > > > Why do we need that check? Seems like a premature optimization. > > > > I consider new URL() costly (and we do it twice). > > This code path is only hit for $rewrite filters and only if they match, right? yup. Will fix it. https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:952: rewritten = urlString; On 2018/05/03 02:42:36, Manish Jethani wrote: > We could return rewritten here and return urlString at the end of the function, > then the catch block can be empty. Done.
https://codereview.adblockplus.org/29760704/diff/29769594/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29769594/lib/filterClasses.j... lib/filterClasses.js:942: * @returns {string?} the rewritten URL, or null if it doesn't match. This is no longer true, it never returns null.
Updated pacth https://codereview.adblockplus.org/29760704/diff/29769594/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29769594/lib/filterClasses.j... lib/filterClasses.js:942: * @returns {string?} the rewritten URL, or null if it doesn't match. On 2018/05/04 09:56:40, Manish Jethani wrote: > This is no longer true, it never returns null. Done.
https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:822: else if (option == "REWRITE" && value) On 2018/05/03 18:40:04, hub wrote: > On 2018/05/03 02:42:36, Manish Jethani wrote: > > Note that for the rewrite option an empty string should also be a valid value. > > For example /foo/$rewrite=,domain=example.com should just replace "foo" with > "". > > So I think the check here should be value != null. > > There can't be an empty string. The `optionsRegExp` will skip the rewrite option > because of `=,`. Well that's a shame, because otherwise foo$rewrite= would be such a convenient way to strip out just foo from the URL. Now you could use /foo()/$rewrite=$1 for this, but that would be a regex filter which is supposedly slow for lookup. At some point we should reconsider allowing empty values in options. I'll file a separate issue for that. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:940: * Perform the URL rewrite and check the origin. Does the detail about checking the origin really need to be exposed in this way in the documentation, especially in the title? I think we can just skip this part, just say "Perform the URL rewrite". Also if we follow the rest of the file it should be "Performs the URL rewrite". Since the same filter will be applied to multiple URLs, it should really be "Performs a URL rewrite". In my opinion we can just make it "Rewrites a URL". https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:941: * @param {string} urlString the string URL to rewrite We can just say "the URL to rewrite" Also shall we just call this parameter "url"? It's called "urlString" on the extension side because there's already a "url" variable, otherwise it's usually cleaner to just call it "url". Let me know what you think. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:942: * @returns {string?} the rewritten URL, or the orginal in case of failure. Nit: original URL Also: (1) the type should be {string} now (no question mark); (2) let's make it @return rather than @returns to be consistent with the rest of the file; (3) let's drop the period at the end, again for consistency. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); Does this work with Unicode domains? For example https://i❤️.ws/ and the filter "/(https?:\/\/i❤️\.ws\/js\/)(bundle\.js)$/$rewrite=$1example.js" https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:949: if (new URL(rewritten).origin == new URL(urlString).origin) We should probably have a function in lib/common.js: function originsMatch(url1, url2) { try { return new URL(url1).origin == new URL(url2).origin; } catch (e) { } } And then: if (!originsMatch(rewritten, url)) return url; return rewritten;
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:949: if (new URL(rewritten).origin == new URL(urlString).origin) On 2018/05/04 22:50:48, Manish Jethani wrote: > We should probably have a function in lib/common.js: > > function originsMatch(url1, url2) > { > try > { > return new URL(url1).origin == new URL(url2).origin; > } > catch (e) > { > } > } > > And then: > > if (!originsMatch(rewritten, url)) > return url; > > return rewritten; I ran a test to see if separating it out into its own function has any effect on performance. TL;DR negative. Here's the test: (function () { let randomString = () => Math.random().toString(32).substr(2); let randomURL = () => "https://" + (Math.random() < .5 ? (randomString() + ".") : "") + "example.com/foo"; let array = new Array(1000000); for (let i = 0; i < array.length; i++) array[i] = Math.random() < .5 ? ":" : randomURL(); let count = 0; let start = performance.now(); for (let i = 0; i < array.length; i++) { try { if (new URL("https://example.com/").origin == new URL(array[i]).origin) count++; } catch (error) { } } console.log("time:", performance.now() - start); console.log("count:", count); })(); I ran this in the background page console on both Chrome and Firefox. Then I declared the following function: function originsMatch(url1, url2) { try { return new URL(url1).origin == new URL(url2).origin; } catch (error) { } } And then I modified the test to call this function instead. The result is that it actually slows down a bit (but very slightly). So I think the only benefit of creating a separate function is that it makes the code "cleaner". I don't feel very strongly about this, it's your call.
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/04 22:50:48, Manish Jethani wrote: > Does this work with Unicode domains? For example https://xn--i-7iq.ws/ and the filter > "/(https?:\/\/i❤️\.ws\/js\/)(bundle\.js)$/$rewrite=$1example.js" Never mind, it won't. I've filed Trac #6647 for this. I've confirmed that it does work for normal Unicode characters. In practice this should be enough, at least for now.
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:949: if (new URL(rewritten).origin == new URL(urlString).origin) On 2018/05/04 23:41:46, Manish Jethani wrote: > On 2018/05/04 22:50:48, Manish Jethani wrote: > > We should probably have a function in lib/common.js: > > > > function originsMatch(url1, url2) > > { > > try > > { > > return new URL(url1).origin == new URL(url2).origin; > > } > > catch (e) > > { > > } > > } > > > > And then: > > > > if (!originsMatch(rewritten, url)) > > return url; > > > > return rewritten; > > I ran a test to see if separating it out into its own function has any effect on > performance. TL;DR negative. > > Here's the test: > > (function () > { > let randomString = () => Math.random().toString(32).substr(2); > let randomURL = () => "https://" + > (Math.random() < .5 ? (randomString() + ".") : "") + > "example.com/foo"; > > let array = new Array(1000000); > for (let i = 0; i < array.length; i++) > array[i] = Math.random() < .5 ? ":" : randomURL(); > > let count = 0; > let start = performance.now(); > > for (let i = 0; i < array.length; i++) > { > try > { > if (new URL("https://example.com/").origin == new URL(array[i]).origin) > count++; > } > catch (error) > { > } > } > > console.log("time:", performance.now() - start); > console.log("count:", count); > })(); > > I ran this in the background page console on both Chrome and Firefox. Then I > declared the following function: > > function originsMatch(url1, url2) > { > try > { > return new URL(url1).origin == new URL(url2).origin; > } > catch (error) > { > } > } > > And then I modified the test to call this function instead. The result is that > it actually slows down a bit (but very slightly). > > So I think the only benefit of creating a separate function is that it makes the > code "cleaner". I don't feel very strongly about this, it's your call. As long as we only need this functionality in one place, I'd rather keep the logic here than moving it into a function (in some other module).
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); How about this? let rewritten = new URL(urlString.replace(this.regexp, this.rewrite), urlString); That way if the result of the substitution is a relative URL, it will be resolved relative to the original URL. This would also aid filters like that: ||example.com/foo^$rewrite=/bar What do you think?
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/05 17:07:07, Sebastian Noack wrote: > How about this? > > let rewritten = new URL(urlString.replace(this.regexp, this.rewrite), > urlString); > > That way if the result of the substitution is a relative URL, it will be > resolved relative to the original URL. This would also aid filters like that: > > ||example.com/foo^$rewrite=/bar > > What do you think? Yeah, I like the idea. So it would be something like: let rewritten = urlString.replace(this.regexp, this.rewrite); if (new URL(rewritten, urlString).origin == new URL(urlString).origin) return rewritten;
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/05 23:58:39, Manish Jethani wrote: > On 2018/05/05 17:07:07, Sebastian Noack wrote: > > How about this? > > > > let rewritten = new URL(urlString.replace(this.regexp, this.rewrite), > > urlString); > > > > That way if the result of the substitution is a relative URL, it will be > > resolved relative to the original URL. This would also aid filters like that: > > > > ||example.com/foo^$rewrite=/bar > > > > What do you think? > > Yeah, I like the idea. > > So it would be something like: > > let rewritten = urlString.replace(this.regexp, this.rewrite); > if (new URL(rewritten, urlString).origin == new URL(urlString).origin) > return rewritten; Oh well, redirectUrl can't be relative. It'll have to be more like: let {href, origin} = new URL(urlString.replace(this.regexp, this.rewrite), urlString); if (origin == new URL(urlString).origin) return href; Unfortunately this means we'll be returning an IDN-encoded version of the rewritten URL, and that's the one that'll appear in the DevTools panel. If we're going to switch entirely to IDN encoding anyway [1], we might as well wait until then for consistency. If we make the change later to support relative URLs it'll still be backwards compatible with any existing filters. Also if we're going to match against IDN-encoded URLs then this function can accept the URL object itself (which already exists at the caller's end) so it doesn't have to create a new one. So it's going to be: let {href, origin} = new URL(url.href.replace(this.regexp, this.rewrite), url); if (origin == url.origin) return href; But this can be done later so for now we can just not support relative URLs here. [1]: https://issues.adblockplus.org/ticket/6647#comment:6
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/06 00:45:41, Manish Jethani wrote: > Unfortunately this means we'll be returning an IDN-encoded version of the > rewritten URL, and that's the one that'll appear in the DevTools panel. If we're > going to switch entirely to IDN encoding anyway [1], we might as well wait until > then for consistency. As far as I can tell, if we return an IDN-encoded URL here, everything should still work as expected. If its just for the devtools panel, where the IDN-encoded rewritten URL might appear inconsistent, this is fine for the time being, IMO, if we are going to switch entirely to IDN encoding anyway. Alternatively, we could switch to IDN encoding entirely before introducing the $rewrite filter option. Or we could convert the URL returned here, when it is logged to the devtools panel in the calling code, for the time being.
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/06 00:45:41, Manish Jethani wrote: > [...] > > So it's going to be: > > let {href, origin} = new URL(url.href.replace(this.regexp, this.rewrite), > url); > if (origin == url.origin) > return href; Sorry, here if we accept a URL object as a parameter then we should also return the new URL object. The caller can easily do String() on it or access the href property explicitly. The return value is guaranteed to be non-null. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/06 13:04:25, Sebastian Noack wrote: > On 2018/05/06 00:45:41, Manish Jethani wrote: > > Unfortunately this means we'll be returning an IDN-encoded version of the > > rewritten URL, and that's the one that'll appear in the DevTools panel. If > we're > > going to switch entirely to IDN encoding anyway [1], we might as well wait > until > > then for consistency. > > As far as I can tell, if we return an IDN-encoded URL here, everything should > still work as expected. If its just for the devtools panel, where the > IDN-encoded rewritten URL might appear inconsistent, this is fine for the time > being, IMO, if we are going to switch entirely to IDN encoding anyway. > > [...] Yeah, after checking EasyList and EasyList China [1] at least I can say that this would have no impact in practice. I'll see if I can run a script on all the EasyList filter lists and maybe some of the other popular ones. [1]: https://issues.adblockplus.org/ticket/6647#comment:8
Updated patch. https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29768685/lib/filterClasses.j... lib/filterClasses.js:822: else if (option == "REWRITE" && value) On 2018/05/04 22:50:48, Manish Jethani wrote: > On 2018/05/03 18:40:04, hub wrote: > > On 2018/05/03 02:42:36, Manish Jethani wrote: > > > Note that for the rewrite option an empty string should also be a valid > value. > > > For example /foo/$rewrite=,domain=example.com should just replace "foo" with > > "". > > > So I think the check here should be value != null. > > > > There can't be an empty string. The `optionsRegExp` will skip the rewrite > option > > because of `=,`. > > Well that's a shame, because otherwise foo$rewrite= would be such a convenient > way to strip out just foo from the URL. Now you could use /foo()/$rewrite=$1 for > this, but that would be a regex filter which is supposedly slow for lookup. > > At some point we should reconsider allowing empty values in options. I'll file a > separate issue for that. Acknowledged. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:940: * Perform the URL rewrite and check the origin. On 2018/05/04 22:50:48, Manish Jethani wrote: > Does the detail about checking the origin really need to be exposed in this way > in the documentation, especially in the title? I think we can just skip this > part, just say "Perform the URL rewrite". > > Also if we follow the rest of the file it should be "Performs the URL rewrite". > > Since the same filter will be applied to multiple URLs, it should really be > "Performs a URL rewrite". > > In my opinion we can just make it "Rewrites a URL". Done. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:941: * @param {string} urlString the string URL to rewrite On 2018/05/04 22:50:48, Manish Jethani wrote: > We can just say "the URL to rewrite" > > Also shall we just call this parameter "url"? It's called "urlString" on the > extension side because there's already a "url" variable, otherwise it's usually > cleaner to just call it "url". Let me know what you think. "url" is fine by me. Done. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:942: * @returns {string?} the rewritten URL, or the orginal in case of failure. On 2018/05/04 22:50:49, Manish Jethani wrote: > Nit: original URL > > Also: (1) the type should be {string} now (no question mark); (2) let's make it > @return rather than @returns to be consistent with the rest of the file; (3) > let's drop the period at the end, again for consistency. Done. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/06 13:59:31, Manish Jethani wrote: > On 2018/05/06 13:04:25, Sebastian Noack wrote: > > On 2018/05/06 00:45:41, Manish Jethani wrote: > > > Unfortunately this means we'll be returning an IDN-encoded version of the > > > rewritten URL, and that's the one that'll appear in the DevTools panel. If > > we're > > > going to switch entirely to IDN encoding anyway [1], we might as well wait > > until > > > then for consistency. > > > > As far as I can tell, if we return an IDN-encoded URL here, everything should > > still work as expected. If its just for the devtools panel, where the > > IDN-encoded rewritten URL might appear inconsistent, this is fine for the time > > being, IMO, if we are going to switch entirely to IDN encoding anyway. > > > > [...] > > Yeah, after checking EasyList and EasyList China [1] at least I can say that > this would have no impact in practice. I'll see if I can run a script on all the > EasyList filter lists and maybe some of the other popular ones. > > [1]: https://issues.adblockplus.org/ticket/6647#comment:8 I'll leave it like that for now. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:949: if (new URL(rewritten).origin == new URL(urlString).origin) On 2018/05/05 13:34:43, Sebastian Noack wrote: > On 2018/05/04 23:41:46, Manish Jethani wrote: > > On 2018/05/04 22:50:48, Manish Jethani wrote: > > > We should probably have a function in lib/common.js: > > > > > > function originsMatch(url1, url2) > > > { > > > try > > > { > > > return new URL(url1).origin == new URL(url2).origin; > > > } > > > catch (e) > > > { > > > } > > > } > > > > > > And then: > > > > > > if (!originsMatch(rewritten, url)) > > > return url; > > > > > > return rewritten; > > > > I ran a test to see if separating it out into its own function has any effect > on > > performance. TL;DR negative. > > > > Here's the test: > > > > (function () > > { > > let randomString = () => Math.random().toString(32).substr(2); > > let randomURL = () => "https://" + > > (Math.random() < .5 ? (randomString() + ".") : "") + > > "example.com/foo"; > > > > let array = new Array(1000000); > > for (let i = 0; i < array.length; i++) > > array[i] = Math.random() < .5 ? ":" : randomURL(); > > > > let count = 0; > > let start = performance.now(); > > > > for (let i = 0; i < array.length; i++) > > { > > try > > { > > if (new URL("https://example.com/").origin == new URL(array[i]).origin) > > count++; > > } > > catch (error) > > { > > } > > } > > > > console.log("time:", performance.now() - start); > > console.log("count:", count); > > })(); > > > > I ran this in the background page console on both Chrome and Firefox. Then I > > declared the following function: > > > > function originsMatch(url1, url2) > > { > > try > > { > > return new URL(url1).origin == new URL(url2).origin; > > } > > catch (error) > > { > > } > > } > > > > And then I modified the test to call this function instead. The result is that > > it actually slows down a bit (but very slightly). > > > > So I think the only benefit of creating a separate function is that it makes > the > > code "cleaner". I don't feel very strongly about this, it's your call. > > As long as we only need this functionality in one place, I'd rather keep the > logic here than moving it into a function (in some other module). agreed.
https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/06 13:04:25, Sebastian Noack wrote: > Alternatively, we could switch to IDN encoding entirely before introducing the > $rewrite filter option. Since we are on the verge of landing the change moving to punycode entirely, anything that still holds us back from supporting relative URLs here?
Updated patch. https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29770573/lib/filterClasses.j... lib/filterClasses.js:946: let rewritten = urlString.replace(this.regexp, this.rewrite); On 2018/05/08 06:57:30, Sebastian Noack wrote: > On 2018/05/06 13:04:25, Sebastian Noack wrote: > > Alternatively, we could switch to IDN encoding entirely before introducing the > > $rewrite filter option. > > Since we are on the verge of landing the change moving to punycode entirely, > anything that still holds us back from supporting relative URLs here? it is done now.
Other than that the change looks good to me. (I moved myself back to CC, since I won't be available for the rest of the week, and I'm not module owner here anyway.) https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.j... lib/filterClasses.js:941: * @param {string} url the URL to rewrite The type annotated here is incorrect. We expect an URL object, not a string.
https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.j... lib/filterClasses.js:941: * @param {string} url the URL to rewrite On 2018/05/08 17:09:30, Sebastian Noack wrote: > The type annotated here is incorrect. We expect an URL object, not a string. In the onBeforeRequest listener we pass details.url to rewriteUrl, which is a string. And below we call String.prototype.replace() on it. So we really expect a string.
https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.j... lib/filterClasses.js:941: * @param {string} url the URL to rewrite On 2018/05/08 17:14:49, hub wrote: > On 2018/05/08 17:09:30, Sebastian Noack wrote: > > The type annotated here is incorrect. We expect an URL object, not a string. > > In the onBeforeRequest listener we pass details.url to rewriteUrl, which is a > string. And below we call String.prototype.replace() on it. > > So we really expect a string. You are right. But didn't Manish suggest to pass in an URL object instead? (I personally don't mind much either way.)
https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.j... lib/filterClasses.js:941: * @param {string} url the URL to rewrite On 2018/05/08 17:25:37, Sebastian Noack wrote: > On 2018/05/08 17:14:49, hub wrote: > > On 2018/05/08 17:09:30, Sebastian Noack wrote: > > > The type annotated here is incorrect. We expect an URL object, not a string. > > > > In the onBeforeRequest listener we pass details.url to rewriteUrl, which is a > > string. And below we call String.prototype.replace() on it. > > > > So we really expect a string. > > You are right. But didn't Manish suggest to pass in an URL object instead? (I > personally don't mind much either way.) I was thinking that we already have a URL object on the calling side, we can just pass it in and then do the rewriting on the href property. This way we don't have to create a new one here. For consistency we could also return rewrittenUrl rather than rewrittenUrl.href. But the change looks good enough to me as it is, I don't mind if we leave it like this. What do you think, Hubert? LGTM
Adding dave@ instead of kzar@
On 2018/05/08 18:52:32, Manish Jethani wrote: > LGTM By the way, I would write the commit message as "Issue 6592 - Implement rewrite filter option" (imperative) [1] [1]: https://news.ycombinator.com/item?id=4270890
On 2018/05/08 18:59:02, Manish Jethani wrote: > On 2018/05/08 18:52:32, Manish Jethani wrote: > > LGTM > > By the way, I would write the commit message as "Issue 6592 - Implement rewrite > filter option" (imperative) [1] Done
On 2018/05/08 19:13:55, hub wrote: > On 2018/05/08 18:59:02, Manish Jethani wrote: > > On 2018/05/08 18:52:32, Manish Jethani wrote: > > > LGTM > > > > By the way, I would write the commit message as "Issue 6592 - Implement > rewrite > > filter option" (imperative) [1] > > Done Strange, somehow it didn't get updated here in the title.
On 2018/05/09 13:42:31, Manish Jethani wrote: > On 2018/05/08 19:13:55, hub wrote: > > On 2018/05/08 18:59:02, Manish Jethani wrote: > > > On 2018/05/08 18:52:32, Manish Jethani wrote: > > > > LGTM > > > > > > By the way, I would write the commit message as "Issue 6592 - Implement > > rewrite > > > filter option" (imperative) [1] > > > > Done > > Strange, somehow it didn't get updated here in the title. I hadn't, but the local patch to push was. updated now.
On 2018/05/09 14:24:48, hub wrote: > On 2018/05/09 13:42:31, Manish Jethani wrote: > > On 2018/05/08 19:13:55, hub wrote: > > > On 2018/05/08 18:59:02, Manish Jethani wrote: > > > > On 2018/05/08 18:52:32, Manish Jethani wrote: > > > > > LGTM > > > > > > > > By the way, I would write the commit message as "Issue 6592 - Implement > > > rewrite > > > > filter option" (imperative) [1] > > > > > > Done > > > > Strange, somehow it didn't get updated here in the title. > > I hadn't, but the local patch to push was. updated now. OK, would be awesome if you could update the title as well. LGTM PS: Dave, would you have time to look at this? Since you're the module owner we need the go-ahead from you.
https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.j... lib/filterClasses.js:935: * @type {string} This should be {?string} now after rebasing.
patch updated. https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29774568/lib/filterClasses.j... lib/filterClasses.js:935: * @type {string} On 2018/05/11 13:59:29, Manish Jethani wrote: > This should be {?string} now after rebasing. Done.
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:900: * The rewrite expression Mind making this comment useful? (Same below.) https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) What's with this change? https://codereview.adblockplus.org/29760704/diff/29778589/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/test/filterClasses.... test/filterClasses.js:538: "http://content.server/file/foo.txt/disable"); Nit: We normally put the closing parenthesis on the next line if the first argument is on the line below.
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:955: return url; Why do we return `url` on failure? Shouldn't we return `false` or similar instead? That would be cheaper to check in the onBeforeRequest listener. (Yea, we'd also have to tweak the above logic to check that `url != rewrittenUrl`.)
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:955: return url; On 2018/05/15 14:24:57, kzar wrote: > Why do we return `url` on failure? Shouldn't we return `false` or similar > instead? That would be cheaper to check in the onBeforeRequest listener. (Yea, > we'd also have to tweak the above logic to check that `url != rewrittenUrl`.) Even if no error is thrown and no origin change was detected, the URL might be the same after the substitution. If we distinguish between those scenarios here, we'd also have to handle them separately in the calling code.
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:955: return url; On 2018/05/15 14:40:34, Sebastian Noack wrote: > ...we'd also have to handle them separately in the calling code. Not if we tweak the above logic, so we don't return the rewritten URL if either the origins don't match or if the URLs are identical.
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:955: return url; On 2018/05/15 14:44:23, kzar wrote: > On 2018/05/15 14:40:34, Sebastian Noack wrote: > > ...we'd also have to handle them separately in the calling code. > > Not if we tweak the above logic, so we don't return the rewritten URL if either > the origins don't match or if the URLs are identical. Sure, we could add another check here, in order to only return a string if it differs from the original URL, and then in the calling code check for null instead of checking whether the value changed. But I don't see how that makes anything simpler (or faster).
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:955: return url; On 2018/05/15 14:51:02, Sebastian Noack wrote: > On 2018/05/15 14:44:23, kzar wrote: > > On 2018/05/15 14:40:34, Sebastian Noack wrote: > > > ...we'd also have to handle them separately in the calling code. > > > > Not if we tweak the above logic, so we don't return the rewritten URL if > either > > the origins don't match or if the URLs are identical. > > Sure, we could add another check here, in order to only return a string if it > differs from the original URL, and then in the calling code check for null > instead of checking whether the value changed. But I don't see how that makes > anything simpler (or faster). Well it seems pointless to return the existing URL if the origin didn't match (or replace threw) only to then compare that URL with the existing one afterwards. But I guess it's not expected to be too common a case so I guess I don't insist.
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:955: return url; On 2018/05/15 14:53:58, kzar wrote: > On 2018/05/15 14:51:02, Sebastian Noack wrote: > > On 2018/05/15 14:44:23, kzar wrote: > > > On 2018/05/15 14:40:34, Sebastian Noack wrote: > > > > ...we'd also have to handle them separately in the calling code. > > > > > > Not if we tweak the above logic, so we don't return the rewritten URL if > > either > > > the origins don't match or if the URLs are identical. > > > > Sure, we could add another check here, in order to only return a string if it > > differs from the original URL, and then in the calling code check for null > > instead of checking whether the value changed. But I don't see how that makes > > anything simpler (or faster). > > Well it seems pointless to return the existing URL if the origin didn't match > (or replace threw) only to then compare that URL with the existing one > afterwards. But I guess it's not expected to be too common a case so I guess I > don't insist. These are not the only cases in which the rewrite doesn't happen (the above examples are failures), it could also be that there's a filter like /(trackingId=)\d+/$rewrite=$10 (replace any tracking ID with 0), in which case it would "fail" for "trackingId=0" (technically successful but nothing changed). I think explicitly comparing the output with the input is a bit weird for this case.
Updated patch. https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:900: * The rewrite expression On 2018/05/15 14:08:58, kzar wrote: > Mind making this comment useful? (Same below.) Done. https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:955: return url; On 2018/05/15 14:24:57, kzar wrote: > Why do we return `url` on failure? Shouldn't we return `false` or similar > instead? That would be cheaper to check in the onBeforeRequest listener. (Yea, > we'd also have to tweak the above logic to check that `url != rewrittenUrl`.) The original idea is to not block the request in case of error. A gentle failure. https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) On 2018/05/15 14:08:58, kzar wrote: > What's with this change? This one is required: it was a polyfill for the URL API that NodeJS didn't have. And right now it breaks if left in since it is incomplete, and no longer needed. https://codereview.adblockplus.org/29760704/diff/29778589/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/test/filterClasses.... test/filterClasses.js:538: "http://content.server/file/foo.txt/disable"); On 2018/05/15 14:08:58, kzar wrote: > Nit: We normally put the closing parenthesis on the next line if the first > argument is on the line below. Done. But the linter doesn't catch this.
https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) On 2018/05/15 16:16:31, hub wrote: > On 2018/05/15 14:08:58, kzar wrote: > > What's with this change? > > This one is required: it was a polyfill for the URL API that NodeJS didn't have. > And right now it breaks if left in since it is incomplete, and no longer needed. And I must add: this is only used in the tests.
https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/lib/filterClasses.j... lib/filterClasses.js:900: * The rewrite expression On 2018/05/15 16:16:31, hub wrote: > On 2018/05/15 14:08:58, kzar wrote: > > Mind making this comment useful? (Same below.) > > Done. Thanks, looking way better now. https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) On 2018/05/15 16:17:53, hub wrote: > On 2018/05/15 16:16:31, hub wrote: > > On 2018/05/15 14:08:58, kzar wrote: > > > What's with this change? > > > > This one is required: it was a polyfill for the URL API that NodeJS didn't > have. > > And right now it breaks if left in since it is incomplete, and no longer > needed. > > And I must add: this is only used in the tests. I think this was for Node.js 7 compatability which (according to the README in adblockpluschrome and adblockpluscore) we still support so far. I'm not against making Node.js 8 the minimum version, so long as you update the READMEs too, what do you think Sebastian? https://codereview.adblockplus.org/29760704/diff/29778589/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29778589/test/filterClasses.... test/filterClasses.js:538: "http://content.server/file/foo.txt/disable"); On 2018/05/15 16:16:31, hub wrote: > On 2018/05/15 14:08:58, kzar wrote: > > Nit: We normally put the closing parenthesis on the next line if the first > > argument is on the line below. > > Done. > > But the linter doesn't catch this. Yea, we made the linting rules as strict as possible without adding false positives, but it doesn't catch everything. I don't think this is a rule anyway, just a convention I've noticed.
https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) On 2018/05/16 09:22:40, kzar wrote: > On 2018/05/15 16:17:53, hub wrote: > > On 2018/05/15 16:16:31, hub wrote: > > > On 2018/05/15 14:08:58, kzar wrote: > > > > What's with this change? > > > > > > This one is required: it was a polyfill for the URL API that NodeJS didn't > > have. > > > And right now it breaks if left in since it is incomplete, and no longer > > needed. > > > > And I must add: this is only used in the tests. > > I think this was for Node.js 7 compatability which (according to the README in > adblockpluschrome and adblockpluscore) we still support so far. > > I'm not against making Node.js 8 the minimum version, so long as you update the > READMEs too, what do you think Sebastian? I'm on Node.js 8.5, and there is no global URL object either.
https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) On 2018/05/16 10:53:29, Sebastian Noack wrote: > On 2018/05/16 09:22:40, kzar wrote: > > On 2018/05/15 16:17:53, hub wrote: > > > On 2018/05/15 16:16:31, hub wrote: > > > > On 2018/05/15 14:08:58, kzar wrote: > > > > > What's with this change? > > > > > > > > This one is required: it was a polyfill for the URL API that NodeJS didn't > > > have. > > > > And right now it breaks if left in since it is incomplete, and no longer > > > needed. > > > > > > And I must add: this is only used in the tests. > > > > I think this was for Node.js 7 compatability which (according to the README in > > adblockpluschrome and adblockpluscore) we still support so far. > > > > I'm not against making Node.js 8 the minimum version, so long as you update > the > > READMEs too, what do you think Sebastian? > > I'm on Node.js 8.5, and there is no global URL object either. According to the documentation, it was added in Node 7 https://nodejs.org/dist/latest-v8.x/docs/api/url.html#url_the_whatwg_url_api
https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) On 2018/05/16 12:08:11, hub wrote: > On 2018/05/16 10:53:29, Sebastian Noack wrote: > > On 2018/05/16 09:22:40, kzar wrote: > > > On 2018/05/15 16:17:53, hub wrote: > > > > On 2018/05/15 16:16:31, hub wrote: > > > > > On 2018/05/15 14:08:58, kzar wrote: > > > > > > What's with this change? > > > > > > > > > > This one is required: it was a polyfill for the URL API that NodeJS > didn't > > > > have. > > > > > And right now it breaks if left in since it is incomplete, and no longer > > > > needed. > > > > > > > > And I must add: this is only used in the tests. > > > > > > I think this was for Node.js 7 compatability which (according to the README > in > > > adblockpluschrome and adblockpluscore) we still support so far. > > > > > > I'm not against making Node.js 8 the minimum version, so long as you update > > the > > > READMEs too, what do you think Sebastian? > > > > I'm on Node.js 8.5, and there is no global URL object either. > > According to the documentation, it was added in Node 7 > > https://nodejs.org/dist/latest-v8.x/docs/api/url.html#url_the_whatwg_url_api Yes, the code here could be changed to: const {URL} = require("url"); And it seems that this should even work with Node.js 7. What I meant, is merely that there is no global URL object in Node.js
On 2018/05/16 12:13:23, Sebastian Noack wrote: > And it seems that this should even work with Node.js 7. I'm not sure that it will IIRC the URL module changed in Node.js a while back, for example see this comment[1] (yes about Node.js 6 rather than 7). Could you at least test and verify your changes work with Node.js 7 Hubert? Then we'll know where we stand. [1] - https://stackoverflow.com/a/45615942/1226469
Updated polyfill for URL, tested with NodeJS 7.10.1, NodeJS 8.5 and 8.11.2 and NodeJS 10.1 https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js File test/_common.js (left): https://codereview.adblockplus.org/29760704/diff/29778589/test/_common.js#old... test/_common.js:34: function URL(urlString) On 2018/05/16 12:13:22, Sebastian Noack wrote: > On 2018/05/16 12:08:11, hub wrote: > > On 2018/05/16 10:53:29, Sebastian Noack wrote: > > > On 2018/05/16 09:22:40, kzar wrote: > > > > On 2018/05/15 16:17:53, hub wrote: > > > > > On 2018/05/15 16:16:31, hub wrote: > > > > > > On 2018/05/15 14:08:58, kzar wrote: > > > > > > > What's with this change? > > > > > > > > > > > > This one is required: it was a polyfill for the URL API that NodeJS > > didn't > > > > > have. > > > > > > And right now it breaks if left in since it is incomplete, and no > longer > > > > > needed. > > > > > > > > > > And I must add: this is only used in the tests. > > > > > > > > I think this was for Node.js 7 compatability which (according to the > README > > in > > > > adblockpluschrome and adblockpluscore) we still support so far. > > > > > > > > I'm not against making Node.js 8 the minimum version, so long as you > update > > > the > > > > READMEs too, what do you think Sebastian? > > > > > > I'm on Node.js 8.5, and there is no global URL object either. > > > > According to the documentation, it was added in Node 7 > > > > https://nodejs.org/dist/latest-v8.x/docs/api/url.html#url_the_whatwg_url_api > > Yes, the code here could be changed to: > > const {URL} = require("url"); > > And it seems that this should even work with Node.js 7. What I meant, is merely > that there is no global URL object in Node.js It is global in NodeJS 10. See https://nodejs.org/dist/latest-v10.x/docs/api/url.html#url_the_whatwg_url_api
https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js File test/_common.js (right): https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js#new... test/_common.js:34: // URL is global in Node 10. In Node 7+ it must be imported. (Thanks for noting the versions involved here.) https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js#new... test/_common.js:35: if (global.URL == undefined) I wonder why we put it in the global Object here, then from there put it in the globals Object later. Also don't we normally do something like `typeof foo == "undefined"` which checking for undefined? So I wonder if this would work instead: let globals = { ... URL: typeof "URL" == undefined ? require("url").URL : URL; };
updated patch https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js File test/_common.js (right): https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js#new... test/_common.js:35: if (global.URL == undefined) On 2018/05/16 13:24:31, kzar wrote: > I wonder why we put it in the global Object here, then from there put it in the > globals Object later. Also don't we normally do something like `typeof foo == > "undefined"` which checking for undefined? > > So I wonder if this would work instead: > > let globals = { > ... > URL: typeof "URL" == undefined ? require("url").URL : URL; > }; The previous polyfill was to define `function URL()`. Similarly here we just import the symbol into the global space. And I think that's the right approach. Whether we add it to `globals` or not is irrelevant, that part is for the sandboxing. BTW `typeof "URL"` is "String" ;-)
https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js File test/_common.js (right): https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js#new... test/_common.js:35: if (global.URL == undefined) On 2018/05/16 18:33:23, hub wrote: > The previous polyfill was to define `function URL()`. Similarly here we > just import the symbol into the global space. And I think that's the > right approach. Well for some longer functions we declare them earlier on in this file (not in the global space as far as I can tell) and then pass the reference through into `globals` so that they can be used by the (sandboxed) tests. Some shorter values / functions we just inline in the `globals` Object, for example `atob`. > Whether we add it to `globals` or not is irrelevant, that part is for > the sandboxing. That is the only relevant part surely, it's the part that makes the tests pass. > BTW `typeof "URL"` is "String" ;-) Sure, sorry that was a typo. Anyway I've just double checked (Node.js 8) and my suggestion does work. It's shorter, less confusing (global vs gloals) and doesn't pointlessly declare the function in the global scope. So without the typo it looks something like this: let globals = { ... // URL is global in Node 10. In Node 7+ it must be imported. URL: typeof URL == "undefined" ? require("url").URL : URL };
updated patch https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js File test/_common.js (right): https://codereview.adblockplus.org/29760704/diff/29783599/test/_common.js#new... test/_common.js:35: if (global.URL == undefined) On 2018/05/17 09:49:01, kzar wrote: > On 2018/05/16 18:33:23, hub wrote: > > The previous polyfill was to define `function URL()`. Similarly here we > > just import the symbol into the global space. And I think that's the > > right approach. > > Well for some longer functions we declare them earlier on in this file (not in > the global space as far as I can tell) and then pass the reference through into > `globals` so that they can be used by the (sandboxed) tests. Some shorter values > / functions we just inline in the `globals` Object, for example `atob`. > > > Whether we add it to `globals` or not is irrelevant, that part is for > > the sandboxing. > > That is the only relevant part surely, it's the part that makes the tests pass. > > > BTW `typeof "URL"` is "String" ;-) > > Sure, sorry that was a typo. > > Anyway I've just double checked (Node.js 8) and my suggestion does work. It's > shorter, less confusing (global vs gloals) and doesn't pointlessly declare the > function in the global scope. So without the typo it looks something like this: > > let globals = { > ... > // URL is global in Node 10. In Node 7+ it must be imported. > URL: typeof URL == "undefined" ? require("url").URL : URL > }; I had no doubt this would be working. Done.
LGTM https://codereview.adblockplus.org/29760704/diff/29784572/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29784572/lib/filterClasses.j... lib/filterClasses.js:949: let rewrittenUrl = new URL(url.replace(this.regexp, this.rewrite), url); I forgot to mention before, I thought how you did this part was clever. Just reusing the existing generated regexp.
https://codereview.adblockplus.org/29760704/diff/29784572/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29760704/diff/29784572/lib/filterClasses.j... lib/filterClasses.js:949: let rewrittenUrl = new URL(url.replace(this.regexp, this.rewrite), url); On 2018/05/17 12:56:06, kzar wrote: > I forgot to mention before, I thought how you did this part was clever. Just > reusing the existing generated regexp. thanks! |