|
|
Created:
Jan. 14, 2015, 10:27 a.m. by Sebastian Noack Modified:
Jan. 22, 2015, 9:57 a.m. Visibility:
Public. |
DescriptionIssue 1802 - Fix element hiding filters with multiple CSS selectors when using shadow DOM
Patch Set 1 : #Patch Set 2 : Added comment explaining regex #
Total comments: 3
Patch Set 3 : Replaced regex with state machine #
Total comments: 5
Patch Set 4 : Addressed comments #MessagesTotal messages: 9
On 2015/01/14 10:32:28, Sebastian Noack wrote: Thanks for adding the comment, I at least understand the idea about what should happen now. To be honest though I'm not really sure how to review this regexp is correct, maybe a better candidate for unit tests?
On 2015/01/15 13:18:22, kzar wrote: > On 2015/01/14 10:32:28, Sebastian Noack wrote: > > Thanks for adding the comment, I at least understand the idea about what should > happen now. To be honest though I'm not really sure how to review this regexp is > correct, maybe a better candidate for unit tests? We can't easily unit test this piece of code. Well, we could test this regex separately, but I don't think that would make sense. And you should understand the regex regardless. Unfortunately you can't spread regular expressions over multiple lines adding comment in JavaScript, unlinke in Python. So let me explain the regex belw in more detail: Preserve leading whitespaces: (\s*) A substring consisting neither of quotes nor a delimiter (comma in this case): [^,"'] A substring quoted with double quotes, i.e it starts and ends with a double quote, but no quote in between, unless there is a preceding backslash escaping the quote: "(?:\\"|[^"])*" Same as above just with single quotes: '(?:\\'|[^'])*' A sequence consisting of the above sequences. That is the simplest way to make sure that commas aren't considered delimiters if quoted: ((?:[^,"']|"(?:\\"|[^"])*"|'(?:\\'|[^'])*')+)
http://codereview.adblockplus.org/5504296699297792/diff/5750085036015616/incl... File include.preload.js (right): http://codereview.adblockplus.org/5504296699297792/diff/5750085036015616/incl... include.preload.js:190: selectors[i] = selectors[i].replace(/(\s*)((?:[^,"']|"(?:\\"|[^"])*"|'(?:\\'|[^'])*')+)/g, "$1::content $2"); Trying to parse CSS selectors via regular expressions seems to be a lost cause to me, in the best scenario we will merely end up with code that is unmaintainable. In addition to being barely readable, this particular regular expression has at least the following issues: * It treats backslashes in strings incorrectly (e.g. foo['as\\'], bar['']). * It doesn't consider escaping outside of strings (e.g. "foo[bar\\,bas]"). People have written huge CSS selector parsers, CSS really isn't simple. Granted, for our use case this complexity might not be required. In fact, we don't even need to recognize CSS escaping properly - it's enough to consider the next character to be escaped. I have a strong suspicion that a state machine would be more readable and maintainable here. We wouldn't need too many states: looks like SELECTOR and STRING would be sufficient (the latter either with ' or " as end character). A backslash should simply make the state machine skip the next character. Note that http://dev.w3.org/csswg/css-syntax-3/ specifies a state machine. The hard part is figuring out which parts of it we need.
http://codereview.adblockplus.org/5504296699297792/diff/5750085036015616/incl... File include.preload.js (right): http://codereview.adblockplus.org/5504296699297792/diff/5750085036015616/incl... include.preload.js:190: selectors[i] = selectors[i].replace(/(\s*)((?:[^,"']|"(?:\\"|[^"])*"|'(?:\\'|[^'])*')+)/g, "$1::content $2"); Just my quick attempt at writing that state machine: function getSelectors(str) { var result = []; var start = 0; var separator = ""; for (var i = 0; i < str.length; i++) { var char = str[i]; if (char == "\\") i++; // Escaped char else if (char == separator) separator = ""; // String ended else if (char == "'" || char == '"') separator = char; // String started else if (separator == "" && char == ",") { // Selector ended result.push(str.substring(start, i)); start = i + 1; } } if (start < str.length) result.push(str.substring(start, str.length)); return result; }
http://codereview.adblockplus.org/5504296699297792/diff/5750085036015616/incl... File include.preload.js (right): http://codereview.adblockplus.org/5504296699297792/diff/5750085036015616/incl... include.preload.js:190: selectors[i] = selectors[i].replace(/(\s*)((?:[^,"']|"(?:\\"|[^"])*"|'(?:\\'|[^'])*')+)/g, "$1::content $2"); On 2015/01/19 20:58:15, Wladimir Palant wrote: > Just my quick attempt at writing that state machine: > > function getSelectors(str) > { > var result = []; > var start = 0; > var separator = ""; > for (var i = 0; i < str.length; i++) > { > var char = str[i]; > if (char == "\\") > i++; // Escaped char > else if (char == separator) > separator = ""; // String ended > else if (char == "'" || char == '"') > separator = char; // String started > else if (separator == "" && char == ",") > { // Selector ended > result.push(str.substring(start, i)); > start = i + 1; > } > } > if (start < str.length) > result.push(str.substring(start, str.length)); > return result; > } Unfortunately it seems we in fact need a state machine here. There you go.
http://codereview.adblockplus.org/5504296699297792/diff/5700735861784576/incl... File include.preload.js (right): http://codereview.adblockplus.org/5504296699297792/diff/5700735861784576/incl... include.preload.js:180: break; Note that technically there might be more separators: ( expects ), [ expects ]. However, I didn't find a scenario where an unescaped comma would be valid inside parentheses, and that would also complicate things given that there could be quoted strings inside. http://codereview.adblockplus.org/5504296699297792/diff/5700735861784576/incl... include.preload.js:186: } Nit: please add a break statement here, for consistency. Otherwise, if somebody adds a new case statement they will need to remember doing that - and typically people won't notice a missing break statement. http://codereview.adblockplus.org/5504296699297792/diff/5700735861784576/incl... include.preload.js:190: result.push(prefix + selector.substring(start, j)); The second parameter is unnecessary here, can be simply selector.substring(start). Then you won't need to use the loop variable outside the loop.
http://codereview.adblockplus.org/5504296699297792/diff/5700735861784576/incl... File include.preload.js (right): http://codereview.adblockplus.org/5504296699297792/diff/5700735861784576/incl... include.preload.js:186: } On 2015/01/22 08:15:52, Wladimir Palant wrote: > Nit: please add a break statement here, for consistency. Otherwise, if somebody > adds a new case statement they will need to remember doing that - and typically > people won't notice a missing break statement. Ah, let's go with if/else then. It's more compact, and seems to perform exactly the same on Chrome. http://codereview.adblockplus.org/5504296699297792/diff/5700735861784576/incl... include.preload.js:190: result.push(prefix + selector.substring(start, j)); On 2015/01/22 08:15:52, Wladimir Palant wrote: > The second parameter is unnecessary here, can be simply > selector.substring(start). Then you won't need to use the loop variable outside > the loop. Oh, you are right.
LGTM |