|
|
Created:
Nov. 3, 2016, 11:40 a.m. by Felix Dahlke Modified:
Nov. 10, 2016, 1:50 p.m. Base URL:
https://bitbucket.org/fhd/adblockpluscore Visibility:
Public. |
DescriptionIssue 4593 - Support regular expressions for CSS property matching
Patch Set 1 #
Total comments: 14
Patch Set 2 : Use String.prototype.startsWith/.endsWith #
Total comments: 3
Patch Set 3 : Don't detect '/' as a regexp and speed up regexp detection #
Total comments: 1
MessagesTotal messages: 24
Really cool idea, mostly looks good. Did you already test that it works for real by hiding an element? https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1046: if (this.regexpSource.charAt(0) == "/" && Nit: How about using startsWith and endsWith instead? https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects... https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); What about flags, for example for case-insensitive matching? `/whatever/i` wouldn't match your logic here.
I guess as well we should update the documentation if this lands. (I can update the blog post too.)
I just thought what will older versions of Adblock Plus would do with these filters? I guess "/abc/" would be converted to /\/abc\// which could then match something like "/abc/". Perhaps that matters?
On 2016/11/03 12:18:16, kzar wrote: > Really cool idea, mostly looks good. Did you already test that it works for real > by hiding an element? Yeah sure, works fine in Chrome, all it takes is a dependency update (filter validation is unaffected by this since it ignores this part anyway). On 2016/11/03 12:21:26, kzar wrote: > I guess as well we should update the documentation if this lands. (I can update > the blog post too.) I was thinking of doing a new devbuild announcement, and yeah, updating your original one would make sense too. Wouldn't add it to the official documentation yet, since this stuff is still considered experimental and should be avoided. On 2016/11/03 13:21:57, kzar wrote: > I just thought what will older versions of Adblock Plus would do with these > filters? I guess "/abc/" would be converted to /\/abc\// which could then match > something like "/abc/". Perhaps that matters? Yup, that's what would happen. We did explicitly say that this syntax is experimental and subject to change, so I guess we can live with that... https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1046: if (this.regexpSource.charAt(0) == "/" && On 2016/11/03 12:18:16, kzar wrote: > Nit: How about using startsWith and endsWith instead? > https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects... Wouldn't work on Safari <9, would it? Maybe this is a good time to start having Safari support in a branch, because if it wasn't for Safari, we could use this now that we dropped support for Chrome <41. https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/03 12:18:16, kzar wrote: > What about flags, for example for case-insensitive matching? `/whatever/i` > wouldn't match your logic here. Hm, good point. How I see it, `i` is actually the only useful flag, and maybe we should just always do case insensitive matching for CSS properties? Note that URL filters (which use the same expression language as non-regexp property selectors) are also case insensitive unless you specify the `match-case` option. Given how CSS properties and values are case insensitive, I don't think we will need case sensitivity here, hm?
> Yup, that's what would happen. We did explicitly say that > this syntax is experimental and subject to change, so I > guess we can live with that... Fair enough. https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1046: if (this.regexpSource.charAt(0) == "/" && On 2016/11/03 13:47:51, Felix Dahlke wrote: > On 2016/11/03 12:18:16, kzar wrote: > > Nit: How about using startsWith and endsWith instead? > > > https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects... > > Wouldn't work on Safari <9, would it? Maybe this is a good time to start having > Safari support in a branch, because if it wasn't for Safari, we could use this > now that we dropped support for Chrome <41. Right, we wouldn't be able to update the adblockpluscore dependency until https://issues.adblockplus.org/ticket/4597 is done if we use `{starts,ends}With`. Thing is we don't want to anyway though since we don't want any more development builds of Safari to be made from the master bookmark. (We're aiming for the current release 1.12.4 as being the last release that includes Safari from master.) https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/03 13:47:51, Felix Dahlke wrote: > Hm, good point. How I see it, `i` is actually the only useful flag Yea, I think you're right there. I thought perhaps multiline support might be useful but I checked and the styles are all on one line at the point they're tested. > should just always do case insensitive matching for CSS properties? > Note that URL filters (which use the same expression language as > non-regexp property selectors) are also case insensitive unless you > specify the `match-case` option. > Given how CSS properties and values are case insensitive, I don't think > we will need case sensitivity here, hm? I suppose that would slightly change the behaviour of non-regexp CSS property filters, since before they used case sensitive matching. But if what you say is right that CSS property names and values are case-insensitive (I had trouble verifying that either way) then I guess using insensitive matching would be better if anything. So if you're sure about that then it sounds good to me.
New patch set is up. https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1046: if (this.regexpSource.charAt(0) == "/" && On 2016/11/03 16:03:19, kzar wrote: > On 2016/11/03 13:47:51, Felix Dahlke wrote: > > On 2016/11/03 12:18:16, kzar wrote: > > > Nit: How about using startsWith and endsWith instead? > > > > > > https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects... > > > > Wouldn't work on Safari <9, would it? Maybe this is a good time to start > having > > Safari support in a branch, because if it wasn't for Safari, we could use this > > now that we dropped support for Chrome <41. > > Right, we wouldn't be able to update the adblockpluscore dependency until > https://issues.adblockplus.org/ticket/4597 is done if we use > `{starts,ends}With`. > > Thing is we don't want to anyway though since we don't want any more development > builds of Safari to be made from the master bookmark. (We're aiming for the > current release 1.12.4 as being the last release that includes Safari from > master.) Alright, seems safe to break Safari <9 then - done. https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/03 16:03:19, kzar wrote: > On 2016/11/03 13:47:51, Felix Dahlke wrote: > > should just always do case insensitive matching for CSS properties? > > Note that URL filters (which use the same expression language as > > non-regexp property selectors) are also case insensitive unless you > > specify the `match-case` option. > > Given how CSS properties and values are case insensitive, I don't think > > we will need case sensitivity here, hm? > > I suppose that would slightly change the behaviour of non-regexp CSS property > filters, since before they used case sensitive matching. But if what you say is > right that CSS property names and values are case-insensitive (I had trouble > verifying that either way) then I guess using insensitive matching would be > better if anything. So if you're sure about that then it sounds good to me. I dug a bit deeper and while I was apparently right that property names and values are in general case insensitive (see https://www.w3.org/TR/CSS2/syndata.html#characters), some property values (e.g. url and content) are actually case sensitive. I can't quite come up with a use case for case sensitive matching in those cases, but I find the issue a bit too complex to tackle it as part of #4593, so I created a followup: https://issues.adblockplus.org/ticket/4600 I suggest we stick to case sensitive matching without support for additional flags for #4593.
https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/03 19:08:02, Felix Dahlke wrote: > I suggest we stick to case sensitive matching without support for > additional flags for #4593. I would prefer to switch to case insensitive matching now if we're planning to do it anyway, we warned that CSS property filters might change in the future but no point making it harder for filter authors than necessary. Unless I'm mistaken `pattern.regexp` will always be a string initially for both adblockplus and adblockpluschrome since both send a "filters.get" message to adblockplusui/messageResponder.js to obtain the filters. The relevant messageResponder code there looks like this: filters = CSSRules.getRulesForDomain(sender.frame.url.hostname); filters = filters.map(function(filter) { return { prefix: filter.selectorPrefix, suffix: filter.selectorSuffix, regexp: filter.regexpString, text: filter.text }; }); callback(filters); I think you can therefore just change the line in adblockpluscore/chrome/content/cssProperties.js to be something like this: regexp = pattern.regexp = new RegExp(regexp, "i");
https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/04 10:06:56, kzar wrote: > On 2016/11/03 19:08:02, Felix Dahlke wrote: > > I suggest we stick to case sensitive matching without support for > > additional flags for #4593. > > I would prefer to switch to case insensitive matching now if we're planning to > do it anyway, we warned that CSS property filters might change in the future but > no point making it harder for filter authors than necessary. > > Unless I'm mistaken `pattern.regexp` will always be a string initially for both > adblockplus and adblockpluschrome since both send a "filters.get" message to > adblockplusui/messageResponder.js to obtain the filters. The relevant > messageResponder code there looks like this: > > filters = CSSRules.getRulesForDomain(sender.frame.url.hostname); > filters = filters.map(function(filter) > { > return { > prefix: filter.selectorPrefix, > suffix: filter.selectorSuffix, > regexp: filter.regexpString, > text: filter.text > }; > }); > callback(filters); > > I think you can therefore just change the line in > adblockpluscore/chrome/content/cssProperties.js to be something like this: > > regexp = pattern.regexp = new RegExp(regexp, "i"); I know it'd be a simple change and I agree we shouldn't wait with this. At the same time, it's really unrelated to #4593, which is why I created #4600. There might be some more discussion, too - note that Wladimir hasn't chipped in yet. But yeah, to speed things up, I'll upload a review for #4600 as soon as #4593 lands.
https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/04 13:43:56, Felix Dahlke wrote: > I know it'd be a simple change and I agree we shouldn't wait with this. At the > same time, it's really unrelated to #4593, which is why I created #4600. There > might be some more discussion, too - note that Wladimir hasn't chipped in yet. It's not really an unrelated change IMO, since like you say part of the reason for changing to case-insensitive matching is for consistency with how our other regular expression filters match.
https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/04 14:29:50, kzar wrote: > On 2016/11/04 13:43:56, Felix Dahlke wrote: > > I know it'd be a simple change and I agree we shouldn't wait with this. At the > > same time, it's really unrelated to #4593, which is why I created #4600. There > > might be some more discussion, too - note that Wladimir hasn't chipped in yet. > > It's not really an unrelated change IMO, since like you say part of the reason > for changing to case-insensitive matching is for consistency with how our other > regular expression filters match. Well, it's mainly for consistency with how our other filter expressions match (although regexp filters also match case insensitively unless you specify match-case I presume). This is 100% unrelated to the new regular expression matching for selectors. I could of course change issue #4593 to cover this as well, into something like "Support regular expressions for CSS property matching and switch to case insensitive matching" but that title alone sounds like it should be two issues, which is why I created #4600 instead.
https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/04 16:11:44, Felix Dahlke wrote: > (although regexp filters also match case insensitively unless you specify > match-case I presume). Yea they do, I checked that. > This is 100% unrelated to the new regular expression matching for selectors. Well since the existing regular expression matching is case insensitive, it makes sense to make this new regular expression matching case insensitive too. I don't understand how that's unrelated.
Here's another try, but if we can't manage to agree let's just wait for Wladimir... https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361641/lib/filterClasses.j... lib/filterClasses.js:1048: regexp = this.regexpSource.slice(1, -1); On 2016/11/04 16:18:27, kzar wrote: > On 2016/11/04 16:11:44, Felix Dahlke wrote: > > (although regexp filters also match case insensitively unless you specify > > match-case I presume). > > Yea they do, I checked that. > > > This is 100% unrelated to the new regular expression matching for selectors. > > Well since the existing regular expression matching is case insensitive, it > makes sense to make this new regular expression matching case insensitive too. I > don't understand how that's unrelated. URL matching is _always_ case insensitive, whether it's a regular expression or not. So it's completely unrelated to whether we support regular expressions or not, those are two separate issues. They are just two things we want to do, two separate things.
On 2016/11/04 16:51:16, Felix Dahlke wrote: > Here's another try, but if we can't manage to agree let's just wait for > Wladimir... Yea, sounds good to me, we're going in circles.
On 2016/11/04 16:53:20, kzar wrote: > Yea, sounds good to me, we're going in circles. (Shame really since it's over four extra characters `, "i"` that we both agree should be added at some point anyway.)
On 2016/11/04 16:58:46, kzar wrote: > On 2016/11/04 16:53:20, kzar wrote: > > Yea, sounds good to me, we're going in circles. > > (Shame really since it's over four extra characters `, "i"` that we both agree > should be added at some point anyway.) (Yeah, would be slightly faster this way. At the same time, we generally try hard not to commingle unrelated changes, I've had to go on way too many bug hunts that were made insanely hard by this in my life...)
(So other than the case insensitive stuff this LGTM, I don't want to block this landing once Wladimir makes the call there.)
On 2016/11/04 17:21:52, kzar wrote: > (So other than the case insensitive stuff this LGTM, I don't want to block this > landing once Wladimir makes the call there.) Thanks. But it just hit me: The CSS we match against here is not actually the CSS code as it was written verbatim, it's a text representation of the CSSOM. Should have realised that earlier, it's evident from colours always being written in rgb() syntax no matter how they were written in the source CSS. I quickly verified that CSS property names and values and up being all lower case when we compare against them. I have yet to check what happens to case sensitive properties such as content and url. But I guess the necessity for case insensitive matching is not as big as we thought.
On 2016/11/04 21:42:01, Felix Dahlke wrote: > On 2016/11/04 17:21:52, kzar wrote: > > (So other than the case insensitive stuff this LGTM, I don't want to block > this > > landing once Wladimir makes the call there.) > > Thanks. But it just hit me: The CSS we match against here is not actually the > CSS code as it was written verbatim, it's a text representation of the CSSOM. > Should have realised that earlier, it's evident from colours always being > written in rgb() syntax no matter how they were written in the source CSS. I > quickly verified that CSS property names and values and up being all lower case > when we compare against them. I have yet to check what happens to case sensitive > properties such as content and url. But I guess the necessity for case > insensitive matching is not as big as we thought. Played around with this some more and yes, all CSS property names and almost all property values end up being lower case when read from document.styleSheets. URLs and the value of the content property are case sensitive however. So I suppose case insensitive matching still makes sense. Let's see what Wladimir thinks.
Concerning case-insensitive matching: I agree with Felix that this is an entirely unrelated change and needs a separate issue with discussion to happen there. I can list the pros and cons there but at the moment I am leaning towards case-insensitive being a better choice. https://codereview.adblockplus.org/29361640/diff/29361726/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361726/lib/filterClasses.j... lib/filterClasses.js:1046: if (this.regexpSource.startsWith("/") && this.regexpSource.endsWith("/")) This is a potentially performance-sensitive code path. We shouldn't be using startsWith() and endsWith() unnecessarily, these are slow compared to the `source.length >= 2 && source[0] == "/" && source[source.length - 1] == "/"` check we use in RegExpFilter. For reference, https://jsperf.com/startswith-vs-direct-access gives me 46% slower in Firefox and 39% slower in Chrome. Besides, this check will currently consider "/" to be a regular expression - it shouldn't.
https://codereview.adblockplus.org/29361640/diff/29361726/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361726/lib/filterClasses.j... lib/filterClasses.js:1046: if (this.regexpSource.startsWith("/") && this.regexpSource.endsWith("/")) On 2016/11/08 09:56:41, Wladimir Palant wrote: > This is a potentially performance-sensitive code path. We shouldn't be using > startsWith() and endsWith() unnecessarily, these are slow compared to the > `source.length >= 2 && source[0] == "/" && source[source.length - 1] == "/"` > check we use in RegExpFilter. For reference, > https://jsperf.com/startswith-vs-direct-access gives me 46% slower in Firefox > and 39% slower in Chrome. > > Besides, this check will currently consider "/" to be a regular expression - it > shouldn't. I'm not so sure about the performance impact of this being very relevant - this is being called from the CSS property filters content script, and only for filters that match on the current domain. But anyway, you're definitely right about the semantic difference, "/" should not be considered to be a regexp. So one way to fix that with my code would be `this.regexpSource.length >= 2 && this.regexpSource.startsWith("/") and this.regexpSource.endsWidth("/")`, from which it isn't far to your suggestion which is also faster. So I'll change that.
New patch set is up. https://codereview.adblockplus.org/29361640/diff/29361726/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29361726/lib/filterClasses.j... lib/filterClasses.js:1046: if (this.regexpSource.startsWith("/") && this.regexpSource.endsWith("/")) On 2016/11/08 15:46:36, Felix Dahlke wrote: > On 2016/11/08 09:56:41, Wladimir Palant wrote: > > This is a potentially performance-sensitive code path. We shouldn't be using > > startsWith() and endsWith() unnecessarily, these are slow compared to the > > `source.length >= 2 && source[0] == "/" && source[source.length - 1] == "/"` > > check we use in RegExpFilter. For reference, > > https://jsperf.com/startswith-vs-direct-access gives me 46% slower in Firefox > > and 39% slower in Chrome. > > > > Besides, this check will currently consider "/" to be a regular expression - > it > > shouldn't. > > I'm not so sure about the performance impact of this being very relevant - this > is being called from the CSS property filters content script, and only for > filters that match on the current domain. > > But anyway, you're definitely right about the semantic difference, "/" should > not be considered to be a regexp. So one way to fix that with my code would be > `this.regexpSource.length >= 2 && this.regexpSource.startsWith("/") and > this.regexpSource.endsWidth("/")`, from which it isn't far to your suggestion > which is also faster. So I'll change that. Done.
LGTM https://codereview.adblockplus.org/29361640/diff/29362058/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29361640/diff/29362058/lib/filterClasses.j... lib/filterClasses.js:1047: this.regexpSource[this.regexpSource.length - 1] == "/") Well, accessing this.regexpSource four times is obviously not perfect either but whatever - I guess that by the time this code path becomes relevant performance-wise we'll have it replaced by C++ anyway.
Fair enough, LGTM |