|
|
Created:
April 25, 2018, 5:25 p.m. by Manish Jethani Modified:
July 17, 2018, 3:25 p.m. CC:
sergei, hub, erikvvold Base URL:
https://hg.adblockplus.org/adblockpluscore/ Visibility:
Public. |
DescriptionThis builds on https://codereview.adblockplus.org/29737558/
Patch Set 1 #
Total comments: 4
Patch Set 2 : Move parseScript to top level #Patch Set 3 : Support single character escape sequences #Patch Set 4 : Support Unicode escape sequences #
Total comments: 1
Patch Set 5 : Rebase #
Total comments: 13
Patch Set 6 : Move singleCharacterEscapes to outer scope #Patch Set 7 : Rename literal to withinQuotes #Patch Set 8 : Add some comments #
Total comments: 3
Patch Set 9 : Revert Patch Set 8 #MessagesTotal messages: 20
Patch Set 1 This builds on https://codereview.adblockplus.org/29737558/ See issue description for specification.
https://codereview.adblockplus.org/29761597/diff/29761598/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29761598/lib/snippets.js#new... lib/snippets.js:83: * @return {Array.<string[]>} I had to use Array.<string[]> instead of string[][] because ESLint complained (even though it is valid JSDoc and actually better, oh I hate ESLint). https://codereview.adblockplus.org/29761597/diff/29761598/lib/snippets.js#new... lib/snippets.js:95: for (let character of [...script.trim(), ";"]) The parsing here is as per the specification in Trac #6538. Also see unit tests. https://codereview.adblockplus.org/29761597/diff/29761598/test/snippets.js File test/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29761598/test/snippets.js#ne... test/snippets.js:172: checkParsedScript( ESLint would not let me format this like the other statements here, because the last argument is an array that spans multiple lines. Uhhh, I hate ESLint.
https://codereview.adblockplus.org/29761597/diff/29761598/test/snippets.js File test/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29761598/test/snippets.js#ne... test/snippets.js:152: checkParsedScript("Script with argument containing an escaped space", Also note that a single backslash in fact has to be written as two backslashes in a JavaScript string literal, so when you read "\\" you should interpret it as "\" in your mind.
Patch Set 3: Support single character escape sequences Since filters can't contain non-space whitespace because of normalization, we need a way to specify tabs, newlines, and such, since the snippet may be trying to look for text containing these in the document. This is easy to do. I've added support for the most common single character escape sequences: newlines, carriages returns, and tabs. We can add more later if needed.
On 2018/04/26 15:36:39, Manish Jethani wrote: > Patch Set 3: Support single character escape sequences > > Since filters can't contain non-space whitespace because of normalization, we > need a way to specify tabs, newlines, and such, since the snippet may be trying > to look for text containing these in the document. This is easy to do. I've > added support for the most common single character escape sequences: newlines, > carriages returns, and tabs. We can add more later if needed. For what it's worth JavaScript has support for a few more single character escape sequences: https://mathiasbynens.be/notes/javascript-escapes We don't have to support all, just the most common ones, because I'm also going to add a generic way to specify any character using an escape sequence (most like Unicode, like "\u0041" for "A").
Patch Set 4: Support Unicode escape sequences Now you can write a filter like "example.com#$#foo b\u0041r" and it'll correctly get interpreted as "foo bar". The "\u" must be followed by four hexadecimal digits specifying the 16-bit Unicode code point. If the four characters do not parse correctly as a hexadecimal number, they will be ignored and the parsing will continue. I do this because the filter may need to contain a Unicode character but it may not be possible to include it literally, for example because the transmission medium supports only ASCII encoding. https://codereview.adblockplus.org/29761597/diff/29762690/test/snippets.js File test/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29762690/test/snippets.js#ne... test/snippets.js:182: checkParsedScript("Script with multiple commands and multiple " + Unrelated indentation update.
On 2018/04/26 16:22:09, Manish Jethani wrote: > Patch Set 4: Support Unicode escape sequences > > Now you can write a filter like "example.com#$#foo b\u0041r" and it'll correctly > get interpreted as "foo bar". Sorry, that would have to be "foo b\u0061r" (small A).
Patch Set 5: Rebase
Copying in Erik and Sebastian since I think as many people as possible should review the parseScript function. https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:84: function parseScript(script) I found this terminology confusing, "script" is the body of the snippet filter which in turns calls a "snippet" script. How about `parseSnippetCalls`? https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:86: const singleCharacterEscapes = new Map([ Might be better to declare this once at the top of the file, instead of every time the function is called? https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:100: for (let character of [...script.trim(), ";"]) Why not just do `script = script.trim() + ";";` above and then `for (let character of script)`? https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:106: if (unicodeEscape.length == 4) What if unicodeEscape never reaches this length? I guess if the script ends with "\u123" those last characters are lost? https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:132: else if (literal || character != ";" && !/\s/u.test(character)) So the literal flag has no effect on escaped characters? For example, for the literal string "\n" I'd have to use "\\n" or "'\\n'".
On 2018/07/10 13:55:21, kzar wrote: > Copying in Erik and Sebastian since I think as many people as possible should > review the parseScript function. Yes, this makes sense. I would like to address the comments here once patch #29737558 has passed review.
Patch Set 6: Move singleCharacterEscapes to outer scope https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:84: function parseScript(script) On 2018/07/10 13:55:20, kzar wrote: > I found this terminology confusing, "script" is the body of the snippet filter > which in turns calls a "snippet" script. How about `parseSnippetCalls`? The argument to this function is the SnippetFilter.script property, I think we agrees on that terminology. In that case parseScript sounds like an appropriate name to me. What do you think? https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:86: const singleCharacterEscapes = new Map([ On 2018/07/10 13:55:20, kzar wrote: > Might be better to declare this once at the top of the file, instead of every > time the function is called? Done. https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:100: for (let character of [...script.trim(), ";"]) On 2018/07/10 13:55:20, kzar wrote: > Why not just do `script = script.trim() + ";";` above and then `for (let > character of script)`? Done. https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:106: if (unicodeEscape.length == 4) On 2018/07/10 13:55:20, kzar wrote: > What if unicodeEscape never reaches this length? I guess if the script ends with > "\u123" those last characters are lost? Yes, they would be lost. If it's not at the end of the script, then it'll eat up the 4 characters following \u and then discard them because they don't parse as a number. https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:132: else if (literal || character != ";" && !/\s/u.test(character)) On 2018/07/10 13:55:20, kzar wrote: > So the literal flag has no effect on escaped characters? For example, for the > literal string "\n" I'd have to use "\\n" or "'\\n'". No, mainly because we need a way to escape the single quote within single quotes. i.e. '\''. The single quotes in a way only group the characters within as a single argument. Literally what it's doing here is that if literal is true then it ignores semicolons and whitespace and takes the character literally.
Patch Set 7: Rename literal to withinQuotes Patch Set 8: Add some comments
Once you remove those extra comments this LGTM. I would like someone (ideally Sebastian) to take a second look at the parseScript function however. We really don't want to fuck that one up. https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:84: function parseScript(script) On 2018/07/11 19:04:41, Manish Jethani wrote: > On 2018/07/10 13:55:20, kzar wrote: > > I found this terminology confusing, "script" is the body of the snippet filter > > which in turns calls a "snippet" script. How about `parseSnippetCalls`? > > The argument to this function is the SnippetFilter.script property, I think we > agrees on that terminology. In that case parseScript sounds like an appropriate > name to me. What do you think? Fair enough. https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:106: if (unicodeEscape.length == 4) On 2018/07/11 19:04:41, Manish Jethani wrote: > On 2018/07/10 13:55:20, kzar wrote: > > What if unicodeEscape never reaches this length? I guess if the script ends > with > > "\u123" those last characters are lost? > > Yes, they would be lost. > > If it's not at the end of the script, then it'll eat up the 4 characters > following \u and then discard them because they don't parse as a number. Acknowledged. https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new... lib/snippets.js:132: else if (literal || character != ";" && !/\s/u.test(character)) On 2018/07/11 19:04:41, Manish Jethani wrote: > On 2018/07/10 13:55:20, kzar wrote: > > So the literal flag has no effect on escaped characters? For example, for the > > literal string "\n" I'd have to use "\\n" or "'\\n'". > > No, mainly because we need a way to escape the single quote within single > quotes. i.e. '\''. The single quotes in a way only group the characters within > as a single argument. > > Literally what it's doing here is that if literal is true then it ignores > semicolons and whitespace and takes the character literally. Acknowledged. https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js#new... lib/snippets.js:92: // Whether the next character should be treated as an escape sequence. I don't these new comments add much, and they are kind of patronising. I'd prefer to remove them. https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js#new... lib/snippets.js:137: withinQuotes = !withinQuotes; Nice, I think this variable name is an improvement.
Patch Set 9: Revert Patch Set 8 On 2018/07/12 10:30:37, kzar wrote: > Once you remove those extra comments this LGTM. Done. > I would like someone (ideally Sebastian) to take a second look at the > parseScript function however. We really don't want to fuck that one up. Yes, it would be nice if Sebastian took a look. I'll wait for his comments. https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js#new... lib/snippets.js:92: // Whether the next character should be treated as an escape sequence. On 2018/07/12 10:30:37, kzar wrote: > I don't these new comments add much, and they are kind of patronising. I'd > prefer to remove them. Well I'm glad you think so, because I don't like to repeat the logic in the comments, it's harder to maintain (the comments need to be kept up to date). Done.
I wonder if the implementation would be simpler (and more robust) if we'd just handle quotes here, and then pass the individual tokens to JSON.parse() to handle other escapes.
On 2018/07/16 16:28:05, Sebastian Noack wrote: > I wonder if the implementation would be simpler (and more robust) if we'd just > handle quotes here, and then pass the individual tokens to JSON.parse() to > handle other escapes. Well then we would be implementing the JSON spec and not the Snippets spec. I don't think that's the way to do it.
On 2018/07/17 10:47:23, Manish Jethani wrote: > On 2018/07/16 16:28:05, Sebastian Noack wrote: > > I wonder if the implementation would be simpler (and more robust) if we'd just > > handle quotes here, and then pass the individual tokens to JSON.parse() to > > handle other escapes. > > Well then we would be implementing the JSON spec and not the Snippets spec. I > don't think that's the way to do it. Isn't this essentially the same in the case here? Or if you want to be more explicit I guess the Snippets spec could just reference JSON.
On 2018/07/17 12:24:13, Sebastian Noack wrote: > On 2018/07/17 10:47:23, Manish Jethani wrote: > > On 2018/07/16 16:28:05, Sebastian Noack wrote: > > > I wonder if the implementation would be simpler (and more robust) if we'd > just > > > handle quotes here, and then pass the individual tokens to JSON.parse() to > > > handle other escapes. > > > > Well then we would be implementing the JSON spec and not the Snippets spec. I > > don't think that's the way to do it. > > Isn't this essentially the same in the case here? Or if you want to be more > explicit I guess the Snippets spec could just reference JSON. If you're suggesting that every argument, after adding the surrounding quotes, could be parsed as a JSON string, that somewhat makes sense. But in order to extract the arguments we would still have to go through the entire script, character by character. If we're going to do that, we might as well do the parsing of the arguments based on our own spec? There is more to JSON strings [1] than what we need in a snippet filter. JSON is more strict. For example, a sole "\x" is an error, whereas in our spec we simply treat it as an "x". We are more forgiving in what input we accept, which is how it should be, since this is not a programming language and is going to be used by non-programmers (so it's more like HTML than JavaScript). One more good reason not to implement JSON is that we're going to have to port this to C++ at some point and it would be really nice if we did not have the overhead of JSON. [1]: https://mathiasbynens.be/notes/javascript-escapes
On 2018/07/17 12:50:10, Manish Jethani wrote: > If you're suggesting that every argument, after adding the surrounding quotes, > could be parsed as a JSON string, that somewhat makes sense. Yes, this what I'm suggesting. > But in order to > extract the arguments we would still have to go through the entire script, > character by character. If we're going to do that, we might as well do the > parsing of the arguments based on our own spec? > > There is more to JSON strings [1] than what we need in a snippet filter. JSON is > more strict. For example, a sole "\x" is an error, whereas in our spec we simply > treat it as an "x". We are more forgiving in what input we accept, which is how > it should be, since this is not a programming language and is going to be used > by non-programmers (so it's more like HTML than JavaScript). > > One more good reason not to implement JSON is that we're going to have to port > this to C++ at some point and it would be really nice if we did not have the > overhead of JSON. Fair enough, LGTM. |