Issue 29761597: Issue 6538, 6781 - Implement script parsing for snippets

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+142 lines, -1 line)			Patch
	M	lib/snippets.js	View	1 2 3 4 5 6 8	2 chunks	+78 lines, -0 lines	0 comments	Download
	M	test/snippets.js	View	1 2 3 4	2 chunks	+64 lines, -1 line	0 comments	Download

Messages

Total messages: 20

Expand All Messages | Collapse All Messages

Manish Jethani

Patch Set 1 This builds on https://codereview.adblockplus.org/29737558/ See issue description for specification.

April 26, 2018, 12:22 p.m. (2018-04-26 12:22:53 UTC) #2

Manish Jethani

https://codereview.adblockplus.org/29761597/diff/29761598/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29761598/lib/snippets.js#newcode83 lib/snippets.js:83: * @return {Array.<string[]>} I had to use Array.<string[]> instead ...

April 26, 2018, 1:25 p.m. (2018-04-26 13:25:17 UTC) #3

Manish Jethani

https://codereview.adblockplus.org/29761597/diff/29761598/test/snippets.js File test/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29761598/test/snippets.js#newcode152 test/snippets.js:152: checkParsedScript("Script with argument containing an escaped space", Also note ...

April 26, 2018, 1:27 p.m. (2018-04-26 13:27:34 UTC) #4

Manish Jethani

Patch Set 3: Support single character escape sequences Since filters can't contain non-space whitespace because ...

April 26, 2018, 3:36 p.m. (2018-04-26 15:36:39 UTC) #5

Manish Jethani

On 2018/04/26 15:36:39, Manish Jethani wrote: > Patch Set 3: Support single character escape sequences ...

April 26, 2018, 3:45 p.m. (2018-04-26 15:45:12 UTC) #6

Manish Jethani

Patch Set 4: Support Unicode escape sequences Now you can write a filter like "example.com#$#foo ...

April 26, 2018, 4:22 p.m. (2018-04-26 16:22:09 UTC) #7

Manish Jethani

On 2018/04/26 16:22:09, Manish Jethani wrote: > Patch Set 4: Support Unicode escape sequences > ...

April 26, 2018, 4:23 p.m. (2018-04-26 16:23:16 UTC) #8

kzar

Copying in Erik and Sebastian since I think as many people as possible should review ...

July 10, 2018, 1:55 p.m. (2018-07-10 13:55:21 UTC) #10

Manish Jethani

On 2018/07/10 13:55:21, kzar wrote: > Copying in Erik and Sebastian since I think as ...

July 11, 2018, 1:19 p.m. (2018-07-11 13:19:30 UTC) #11

Manish Jethani

Patch Set 6: Move singleCharacterEscapes to outer scope https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js File lib/snippets.js (right): https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#newcode84 lib/snippets.js:84: function ...

July 11, 2018, 7:04 p.m. (2018-07-11 19:04:42 UTC) #12

Patch Set 6: Move singleCharacterEscapes to outer scope

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js
File lib/snippets.js (right):

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:84: function parseScript(script)
On 2018/07/10 13:55:20, kzar wrote:
> I found this terminology confusing, "script" is the body of the snippet filter
> which in turns calls a "snippet" script. How about `parseSnippetCalls`?

The argument to this function is the SnippetFilter.script property, I think we
agrees on that terminology. In that case parseScript sounds like an appropriate
name to me. What do you think?

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:86: const singleCharacterEscapes = new Map([
On 2018/07/10 13:55:20, kzar wrote:
> Might be better to declare this once at the top of the file, instead of every
> time the function is called?

Done.

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:100: for (let character of [...script.trim(), ";"])
On 2018/07/10 13:55:20, kzar wrote:
> Why not just do `script = script.trim() + ";";` above and then `for (let
> character of script)`?

Done.

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:106: if (unicodeEscape.length == 4)
On 2018/07/10 13:55:20, kzar wrote:
> What if unicodeEscape never reaches this length? I guess if the script ends
with
> "\u123" those last characters are lost?

Yes, they would be lost.

If it's not at the end of the script, then it'll eat up the 4 characters
following \u and then discard them because they don't parse as a number.

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:132: else if (literal || character != ";" &&
!/\s/u.test(character))
On 2018/07/10 13:55:20, kzar wrote:
> So the literal flag has no effect on escaped characters? For example, for the
> literal string "\n" I'd have to use "\\n" or "'\\n'".

No, mainly because we need a way to escape the single quote within single
quotes. i.e. '\''. The single quotes in a way only group the characters within
as a single argument.

Literally what it's doing here is that if literal is true then it ignores
semicolons and whitespace and takes the character literally.

Manish Jethani

Patch Set 7: Rename literal to withinQuotes Patch Set 8: Add some comments

July 12, 2018, 8:37 a.m. (2018-07-12 08:37:36 UTC) #13

kzar

Once you remove those extra comments this LGTM. I would like someone (ideally Sebastian) to ...

July 12, 2018, 10:30 a.m. (2018-07-12 10:30:37 UTC) #14

Once you remove those extra comments this LGTM.

I would like someone (ideally Sebastian) to take a second look at the
parseScript function however. We really don't want to fuck that one up.

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js
File lib/snippets.js (right):

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:84: function parseScript(script)
On 2018/07/11 19:04:41, Manish Jethani wrote:
> On 2018/07/10 13:55:20, kzar wrote:
> > I found this terminology confusing, "script" is the body of the snippet
filter
> > which in turns calls a "snippet" script. How about `parseSnippetCalls`?
> 
> The argument to this function is the SnippetFilter.script property, I think we
> agrees on that terminology. In that case parseScript sounds like an
appropriate
> name to me. What do you think?

Fair enough.

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:106: if (unicodeEscape.length == 4)
On 2018/07/11 19:04:41, Manish Jethani wrote:
> On 2018/07/10 13:55:20, kzar wrote:
> > What if unicodeEscape never reaches this length? I guess if the script ends
> with
> > "\u123" those last characters are lost?
> 
> Yes, they would be lost.
> 
> If it's not at the end of the script, then it'll eat up the 4 characters
> following \u and then discard them because they don't parse as a number.

Acknowledged.

https://codereview.adblockplus.org/29761597/diff/29787583/lib/snippets.js#new...
lib/snippets.js:132: else if (literal || character != ";" &&
!/\s/u.test(character))
On 2018/07/11 19:04:41, Manish Jethani wrote:
> On 2018/07/10 13:55:20, kzar wrote:
> > So the literal flag has no effect on escaped characters? For example, for
the
> > literal string "\n" I'd have to use "\\n" or "'\\n'".
> 
> No, mainly because we need a way to escape the single quote within single
> quotes. i.e. '\''. The single quotes in a way only group the characters within
> as a single argument.
> 
> Literally what it's doing here is that if literal is true then it ignores
> semicolons and whitespace and takes the character literally.

Acknowledged.

https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js
File lib/snippets.js (right):

https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js#new...
lib/snippets.js:92: // Whether the next character should be treated as an escape
sequence.
I don't these new comments add much, and they are kind of patronising. I'd
prefer to remove them.

https://codereview.adblockplus.org/29761597/diff/29828558/lib/snippets.js#new...
lib/snippets.js:137: withinQuotes = !withinQuotes;
Nice, I think this variable name is an improvement.

Manish Jethani

Patch Set 9: Revert Patch Set 8 On 2018/07/12 10:30:37, kzar wrote: > Once you ...

July 12, 2018, 10:57 a.m. (2018-07-12 10:57:49 UTC) #15

Sebastian Noack

I wonder if the implementation would be simpler (and more robust) if we'd just handle ...

July 16, 2018, 4:28 p.m. (2018-07-16 16:28:05 UTC) #16

Manish Jethani

On 2018/07/16 16:28:05, Sebastian Noack wrote: > I wonder if the implementation would be simpler ...

July 17, 2018, 10:47 a.m. (2018-07-17 10:47:23 UTC) #17

Sebastian Noack

On 2018/07/17 10:47:23, Manish Jethani wrote: > On 2018/07/16 16:28:05, Sebastian Noack wrote: > > ...

July 17, 2018, 12:24 p.m. (2018-07-17 12:24:13 UTC) #18

Manish Jethani

On 2018/07/17 12:24:13, Sebastian Noack wrote: > On 2018/07/17 10:47:23, Manish Jethani wrote: > > ...

July 17, 2018, 12:50 p.m. (2018-07-17 12:50:10 UTC) #19

Sebastian Noack

July 17, 2018, 2:56 p.m. (2018-07-17 14:56:26 UTC) #20

On 2018/07/17 12:50:10, Manish Jethani wrote:
> If you're suggesting that every argument, after adding the surrounding quotes,
> could be parsed as a JSON string, that somewhat makes sense.

Yes, this what I'm suggesting.

> But in order to
> extract the arguments we would still have to go through the entire script,
> character by character. If we're going to do that, we might as well do the
> parsing of the arguments based on our own spec?
> 
> There is more to JSON strings [1] than what we need in a snippet filter. JSON
is
> more strict. For example, a sole "\x" is an error, whereas in our spec we
simply
> treat it as an "x". We are more forgiving in what input we accept, which is
how
> it should be, since this is not a programming language and is going to be used
> by non-programmers (so it's more like HTML than JavaScript).
> 
> One more good reason not to implement JSON is that we're going to have to port
> this to C++ at some point and it would be really nice if we did not have the
> overhead of JSON.

Fair enough, LGTM.

Expand All Messages | Collapse All Messages

Issue 29761597: Issue 6538, 6781 - Implement script parsing for snippets (Closed)

Description

Patch Set 1 #

Patch Set 2 : Move parseScript to top level #

Patch Set 3 : Support single character escape sequences #

Patch Set 4 : Support Unicode escape sequences #

Patch Set 5 : Rebase #

Patch Set 6 : Move singleCharacterEscapes to outer scope #

Patch Set 7 : Rename literal to withinQuotes #

Patch Set 8 : Add some comments #

Patch Set 9 : Revert Patch Set 8 #

Messages