Issue 29680689: [$csp2 adblockpluscore] Issue 6329 - Add the CSP filter type

kzar

Patch Set 1

Jan. 26, 2018, 6:11 p.m. (2018-01-26 18:11:53 UTC) #1

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29680690/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29680690/lib/filterClasses.js#newcode758 lib/filterClasses.js:758: csp = value.toLowerCase(); We could avoid uppercasing and then ...

Feb. 6, 2018, 5:54 a.m. (2018-02-06 05:54:20 UTC) #2

kzar

Patch Set 2 : Rebased, brought back Filter.normalize, other changes Patch Set 3 : Check ...

March 6, 2018, 3:37 p.m. (2018-03-06 15:37:53 UTC) #3

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29715744/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29715744/lib/filterClasses.js#newcode746 lib/filterClasses.js:746: if (option == "CSP") I've left a comment on ...

March 7, 2018, 12:22 a.m. (2018-03-07 00:22:13 UTC) #4

kzar

Patch Set 4 : Fully normalise whitespace, avoid strict equality https://codereview.adblockplus.org/29680689/diff/29715744/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29715744/lib/filterClasses.js#newcode746 ...

March 12, 2018, 1:35 p.m. (2018-03-12 13:35:59 UTC) #5

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode192 lib/filterClasses.js:192: let strippedText = text.replace(/\s/g, ""); Since we have already ...

March 12, 2018, 4:34 p.m. (2018-03-12 16:34:53 UTC) #6

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 12, 2018, 6:53 p.m. (2018-03-12 18:53:59 UTC) #7

Sebastian Noack

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 12, 2018, 11:19 p.m. (2018-03-12 23:19:02 UTC) #8

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 13, 2018, 6:59 a.m. (2018-03-13 06:59:30 UTC) #9

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 13, 2018, 7:35 a.m. (2018-03-13 07:35:00 UTC) #10

kzar

Patch Set 5 : New approach, added more unit tests Patch Set 6 : Use ...

March 14, 2018, 1:54 p.m. (2018-03-14 13:54:38 UTC) #11

Patch Set 5 : New approach, added more unit tests
Patch Set 6 : Use space literal instead of \s where possible

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:190: // For most regexp filters we strip all whitespace,
but the values of $csp
On 2018/03/12 18:53:59, Manish Jethani wrote:
> So this doesn't work for the following cases:
> 
>  1.  foo?csp=b a r$csp=script-src  'self'
>  2.  foo$bar=c s p = ba z,cs p = script-src  'self'

Well spotted, I've added some unit tests for those cases.

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:192: let strippedText = text.replace(/\s/g, "");
On 2018/03/12 16:34:53, Manish Jethani wrote:
> Since we have already stripped out all non-space whitespace, we can just use "
"
> (space) instead of \s here?
> 
>   let strippedText = text.replace(/ /g, "");

I think you're right, Done.

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:196: let optionsMatch =
Filter.optionsRegExp.exec(strippedText);
On 2018/03/12 16:34:53, Manish Jethani wrote:
> We can just look for "$" here and if none is present then there are no
options, right? 

No, that's not good enough unfortunately. There can be a $ (or multiple) with no
options part. I added a test case for that as well FWIW.

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:203: let optionsPosition = 0;
On 2018/03/12 16:34:53, Manish Jethani wrote:
> This part is not being used at all.

Whoops, you're right. I've fixed that now.

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:219: parts.push(
On 2018/03/12 16:34:53, Manish Jethani wrote:
> We've already stripped whitespace once, now we're stripping it again. This
only
> affects filters with a $csp option, but I wonder if we shouldn't just check
for
> $csp before stripping whitespace the first time.
> 
> i.e. above:
> 
>   if (text.indexOf("$") == -1 || !/[$,] *c *s *p *=/i.test(text))
>     return text.replace(/ /g, "");

No because we're expecting thousands of non-csp filters and maybe 5 csp filters.
We should optimise for the common case.

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js#newcode187 lib/filterClasses.js:187: return domain.replace(/ /g, "") + separator + selector.trim(); By ...

March 14, 2018, 3:27 p.m. (2018-03-14 15:27:31 UTC) #12

kzar

Patch Set 7 : Strip whitespace in batches where possible https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js#newcode187 ...

March 14, 2018, 5:28 p.m. (2018-03-14 17:28:59 UTC) #13

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode196 lib/filterClasses.js:196: let optionsMatch = Filter.optionsRegExp.exec(strippedText); On 2018/03/14 13:54:38, kzar wrote: ...

March 14, 2018, 6:10 p.m. (2018-03-14 18:10:50 UTC) #14

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:196: let optionsMatch =
Filter.optionsRegExp.exec(strippedText);
On 2018/03/14 13:54:38, kzar wrote:
> On 2018/03/12 16:34:53, Manish Jethani wrote:
> > We can just look for "$" here and if none is present then there are no
> options, right? 
> 
> No, that's not good enough unfortunately. There can be a $ (or multiple) with
no
> options part. I added a test case for that as well FWIW.

OK, but I wonder if we shouldn't just look for a "$" first anyway before running
the regular expression. I see the same thing being done in
RegExpFilter.fromText:

  let match = (text.indexOf("$") >= 0 ? Filter.optionsRegExp.exec(text) : null);
  if (match)
  {
    ...
  }

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:192: if (!/csp=/i.test(strippedText))
So how about adding another check here:

  if (strippedText.indexOf("$") == -1 || !/csp=/i.test(strippedText))
    return strippedText;

Because EasyList starts off with a ton of filters with no "$"

Even better:

  let dollarIndex = strippedText.lastIndexOf("$");
  if (dollarIndex == -1 || !/csp=/i.test(strippedText.substring(dollarIndex +
1)))
    return strippedText;

We could probably reuse dollarIndex below (see my other comments).

Also I would change /csp=/i to /\bcsp=/i

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:203: for (let i = beforeOptions.split("$").length; i > 0;
i--)
Could you explain this part? I'm not sure what this is supposed to do. If the
idea is to get the part after the last "$" in the original text, we could just
use String.lastIndexOf here.

  let optionsText = text.substring(text.lastIndexOf("$") + 1);

For what it's worth neither the original code nor my suggestion works with the
following case:

  /foo$/$ csp = script-src  http://example.com/?$1=1&$2=2&$3=3

"$" is a valid character in the query string part of the URL:

https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid

Thankfully there's no case for having a query string or indeed even the path
part of the URL in the $csp filter. Not yet anyway, but things may change in the
future with the CSP standard. Further the filter author can always encode the
"$" (recommended anyway) so it should be OK.

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:206: let options = [];
Why create a separate array for the stripped version? We could just do this:

  let options = optionsText.split(",");
  for (let i = 0; i < options.length; i++)
    options[i] = options[i].replace(...);

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:209: let cspMatch = /^( *c *s *p *=)([^,]+)/i.exec(option);
The regular expression here could be /^( *c *s *p *=)(.+)/i because we already
know that there are no commas there.

And why do we need the capturing groups?

  let cspMatch = /^ *c *s *p *=/.exec(option);
  if (cspMatch)
  {
    options.push(
      cspMatch[0].replace(/ /g, "") +
      option.substring(cspMatch[0].length).trim().replace(/ +/g, " ")
    );
  }

I have a feeling that this might actually be faster, though I haven't tested
this.

kzar

Patch Set 8 : Addressed some of Manish's feedback https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js#newcode192 lib/filterClasses.js:192: ...

March 14, 2018, 7:48 p.m. (2018-03-14 19:48:25 UTC) #15

Patch Set 8 : Addressed some of Manish's feedback

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:192: if (!/csp=/i.test(strippedText))
On 2018/03/14 18:10:50, Manish Jethani wrote:
> So how about adding another check here:
> 
>   if (strippedText.indexOf("$") == -1 || !/csp=/i.test(strippedText))
>     return strippedText;
> 
> Because EasyList starts off with a ton of filters with no "$"

Seems you're right, of 65k filters EasyList has 47k with no options part. Done.

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:203: for (let i = beforeOptions.split("$").length; i > 0;
i--)
On 2018/03/14 18:10:50, Manish Jethani wrote:
> Could you explain this part? I'm not sure what this is supposed to do.

This code uses the index of the options part of the stripped filter text to
figure out what the index of the options part is of the original filter text. It
does that by counting the number of '$'s before the options start in the
stripped string and skipping past that many in the non stripped string.

> For what it's worth neither the original code nor my suggestion works with the
> following case:
> 
>   /foo$/$ csp = script-src  http://example.com/?$1=1&$2=2&$3=3

I've added a test case for that and it worked OK, have I missed something?

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:206: let options = [];
On 2018/03/14 18:10:50, Manish Jethani wrote:
> Why create a separate array for the stripped version? We could just do this:
> 
>   let options = optionsText.split(",");
>   for (let i = 0; i < options.length; i++)
>     options[i] = options[i].replace(...);

Done.

https://codereview.adblockplus.org/29680689/diff/29722732/lib/filterClasses.j...
lib/filterClasses.js:209: let cspMatch = /^( *c *s *p *=)([^,]+)/i.exec(option);
On 2018/03/14 18:10:50, Manish Jethani wrote:
> The regular expression here could be /^( *c *s *p *=)(.+)/i because we already
> know that there are no commas there.
> 
> And why do we need the capturing groups?
> 
>   let cspMatch = /^ *c *s *p *=/.exec(option);
>   if (cspMatch)
>   {
>     options.push(
>       cspMatch[0].replace(/ /g, "") +
>       option.substring(cspMatch[0].length).trim().replace(/ +/g, " ")
>     );
>   }
> 
> I have a feeling that this might actually be faster, though I haven't tested
> this.

Done.

Sebastian Noack

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 14, 2018, 8:59 p.m. (2018-03-14 20:59:21 UTC) #16

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:190: // For most regexp filters we strip all whitespace,
but the values of $csp
On 2018/03/13 07:35:00, Manish Jethani wrote:
> On 2018/03/13 06:59:30, Manish Jethani wrote:
> >
> > If we want to avoid creating that array we could try with generators:
> >
> > [...]
> 
> OK, so I inlined that bit, now this seems to perform well:
> 
>   let index = text.indexOf("$");
>   if (index >= 0)
>   {
>     let options = text.substring(index + 1);
>     text = text.substring(0, index + 1).replace(/ +/g, "");
> 
>     for (let start = 0, end; end != options.length; start = end + 1)
>     {
>       end = options.indexOf(",", start);
>       if (end == -1)
>         end = options.length;
> 
>       let fragment = options.substring(start, end);
>       let csp = /^ *c *s *p *=/.exec(fragment);
>       if (csp)
>       {
>         text += csp[0].replace(/ +/g, "") +
>                 fragment.substring(csp[0].length).replace(/ +/g, " ").trim();
>       }
>       else
>       {
>         text += fragment.replace(/ +/g, "");
>       }
>     }
> 
>     return text;
>   }

Any reason, why this suggestion was ignored, without a comment?

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:179: if (/^ *!/.test(text))
It seems previously, comments were allowed do be preceded by any kind of withe
space (\s), now you are explicitly checking for the space character. Why?

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:187: return domain.replace(/ +/g, "") + separator +
selector.trim();
Similarly, here.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
It seems the same can be achieved with less code and without creating an array:

  for (let i; (i = optionsText.indexOf("$")) != -1;)
    optionsText = optionsText.substr(i + 1);

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode179 lib/filterClasses.js:179: if (/^ *!/.test(text)) On 2018/03/14 20:59:20, Sebastian Noack wrote: ...

March 15, 2018, 8:09 a.m. (2018-03-15 08:09:59 UTC) #17

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode192 lib/filterClasses.js:192: if (!strippedText.includes("$") || !/csp=/i.test(strippedText)) Can we make this /\bcsp=/i ...

March 15, 2018, 8:23 a.m. (2018-03-15 08:23:17 UTC) #18

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode204 lib/filterClasses.js:204: optionsText = optionsText.substr(optionsText.indexOf("$") + 1); On 2018/03/15 08:09:59, Manish ...

March 15, 2018, 9:11 a.m. (2018-03-15 09:11:07 UTC) #19

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 08:09:59, Manish Jethani wrote:
> On 2018/03/14 20:59:20, Sebastian Noack wrote:
> > It seems the same can be achieved with less code and without creating an
> array:
> > 
> >   for (let i; (i = optionsText.indexOf("$")) != -1;)
> >     optionsText = optionsText.substr(i + 1);
> 
> This only looks for the last "$" in the string, which can be achieved using
> String.lastIndexOf, but as Dave points out it's not correct. There could be
> legit "$" within the options text as well. This is not actually true right
now,
> but may be in the future; however, since we already don't support commas in
the
> value of the $csp option, we should also not support "$" in the value. If the
> filter author needs to use a "$" they can encode it as "%24" [1]. Am I right?
> 
> I can't think of an actual reason why we'd want to support "$" in the options
> part of the filter. The most complex option is in fact $csp, but even here
> there's no instance of needing to support "$" in the value [2]
> 
> [1]:
>
https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid
> [2]:
>
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Po...

Actually I still don't get why we have to do this. We should run the options
regular expression on the unstripped text.

  if (!text.includes("$") || !/\bc *s *p *=/i.test(text))
    return text.replace(/ +/g, "");

  let optionsMatch = Filter.optionsRegExp.exec(text);
  if (!optionsMatch)
    return text.replace(/ +/g, "");

  ...

  return beforeOptions.replace(/ +/g, "") + "$" + options.join();

optionsMatch already points to the correct location, plus we have skipped one
unnecessary cycle of stripping spaces out of the options text.

kzar

Patch Set 9 : Addressed some more nits https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // ...

March 15, 2018, 10:26 a.m. (2018-03-15 10:26:26 UTC) #20

Patch Set 9 : Addressed some more nits

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:190: // For most regexp filters we strip all whitespace,
but the values of $csp
On 2018/03/14 20:59:20, Sebastian Noack wrote:
> On 2018/03/13 07:35:00, Manish Jethani wrote:
> > On 2018/03/13 06:59:30, Manish Jethani wrote:
> > >
> > > If we want to avoid creating that array we could try with generators:
> > >
> > > [...]
> > 
> > OK, so I inlined that bit, now this seems to perform well:
> > 
> >   let index = text.indexOf("$");
> >   if (index >= 0)
> >   {
> >     let options = text.substring(index + 1);
> >     text = text.substring(0, index + 1).replace(/ +/g, "");
> > 
> >     for (let start = 0, end; end != options.length; start = end + 1)
> >     {
> >       end = options.indexOf(",", start);
> >       if (end == -1)
> >         end = options.length;
> > 
> >       let fragment = options.substring(start, end);
> >       let csp = /^ *c *s *p *=/.exec(fragment);
> >       if (csp)
> >       {
> >         text += csp[0].replace(/ +/g, "") +
> >                 fragment.substring(csp[0].length).replace(/ +/g, "
").trim();
> >       }
> >       else
> >       {
> >         text += fragment.replace(/ +/g, "");
> >       }
> >     }
> > 
> >     return text;
> >   }
> 
> Any reason, why this suggestion was ignored, without a comment?

I ignored that since it assumed the first '$' was the start of the options
string, which I pointed out elsewhere is not necessarily the case. Sorry I
thought that was obvious but I guess not.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:179: if (/^ *!/.test(text))
On 2018/03/15 08:09:59, Manish Jethani wrote:
> On 2018/03/14 20:59:20, Sebastian Noack wrote:
> > It seems previously, comments were allowed do be preceded by any kind of
withe
> > space (\s), now you are explicitly checking for the space character. Why?
> 
> In the preceding code we're stripping out all non-space whitespace characters
> (e.g. "\n") using /[^\S ]/ so we end up with only spaces and non-whitespace
> characters. Now we only need to strip out spaces.

(See this comment as well
https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
)

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:187: return domain.replace(/ +/g, "") + separator +
selector.trim();
On 2018/03/14 20:59:20, Sebastian Noack wrote:
> Similarly, here.

See our responses above.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:192: if (!strippedText.includes("$") ||
!/csp=/i.test(strippedText))
On 2018/03/15 08:23:17, Manish Jethani wrote:
> Can we make this /\bcsp=/i instead?

Done.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 08:09:59, Manish Jethani wrote:
> ... however, since we already don't support commas in the
> value of the $csp option, we should also not support "$" in the value.

I can't tell if you're trolling or not at this point. If we don't care about
slightly changing how existing filters are parsed why didn't we go with the
original suggestion of doing something like `text.trim().replace(/ +/, " ")`?

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 09:11:06, Manish Jethani wrote:
> Actually I still don't get why we have to do this. We should run the
> options regular expression on the unstripped text.

Then the options regexp might not match depending on the whitespace. Yes we
could change the optionsRegexp to add an optional whitespace inbetween every
character, but I don't want to do that since it's already huge and also it's
used later in the common case for parsing regexp filters.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:213: options[i] = option.substring(0,
cspMatch[0].length).replace(/ +/g, "") +
On 2018/03/15 08:23:17, Manish Jethani wrote:
> Well option.substring(0, cspMatch[0].length) is the same as cspMatch[0] so can
> we skip the unnecessary substring here? We already have the string we need.

Done.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:214: option.substr(cspMatch[0].length).trim().replace(/
+/g, " ");
On 2018/03/15 08:23:17, Manish Jethani wrote:
> Any reason why this uses substr instead of substring? We need to pick a
favorite
> and stick with it unless necessary otherwise, and I think substring is the
best
> choice. substr wasn't even standardized until much later.

substr and substring are the same unless the second parameter is passed, since
the second parameter wasn't passed here I went with the shorter option, which
seems consistent with the rest of this file.

I'm more worried about this use of substr[1] since it passes the second
parameter as an index, it just happens to work since the offset is always 0. But
you already disagreed that I should change that...

[1] -
https://codereview.adblockplus.org/29680684/diff/29680685/lib/filterClasses.j...

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 15, 2018, 10:42 a.m. (2018-03-15 10:42:42 UTC) #21

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:190: // For most regexp filters we strip all whitespace,
but the values of $csp
On 2018/03/15 10:26:24, kzar wrote:
> On 2018/03/14 20:59:20, Sebastian Noack wrote:
> > On 2018/03/13 07:35:00, Manish Jethani wrote:
> > > On 2018/03/13 06:59:30, Manish Jethani wrote:
> > > >
> > > > If we want to avoid creating that array we could try with generators:
> > > >
> > > > [...]
> > > 
> > > OK, so I inlined that bit, now this seems to perform well:
> > > 
> > >   let index = text.indexOf("$");
> > >   if (index >= 0)
> > >   {
> > >     let options = text.substring(index + 1);
> > >     text = text.substring(0, index + 1).replace(/ +/g, "");
> > > 
> > >     for (let start = 0, end; end != options.length; start = end + 1)
> > >     {
> > >       end = options.indexOf(",", start);
> > >       if (end == -1)
> > >         end = options.length;
> > > 
> > >       let fragment = options.substring(start, end);
> > >       let csp = /^ *c *s *p *=/.exec(fragment);
> > >       if (csp)
> > >       {
> > >         text += csp[0].replace(/ +/g, "") +
> > >                 fragment.substring(csp[0].length).replace(/ +/g, "
> ").trim();
> > >       }
> > >       else
> > >       {
> > >         text += fragment.replace(/ +/g, "");
> > >       }
> > >     }
> > > 
> > >     return text;
> > >   }
> > 
> > Any reason, why this suggestion was ignored, without a comment?
> 
> I ignored that since it assumed the first '$' was the start of the options
> string, which I pointed out elsewhere is not necessarily the case. Sorry I
> thought that was obvious but I guess not.

I think Sebastian is referring to the rest of the code, i.e. the part where we
already have the correct index of the options part.

So basically starting with this:

  let options = text.substring(index + 1);
  text = text.substring(0, index + 1).replace(/ +/g, "");

  for (let start = 0, end; end != options.length; start = end + 1)
  ...

In my tests I found that this was much faster than using String.split, but
arguably it makes the code harder to read.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 10:26:24, kzar wrote:
> On 2018/03/15 08:09:59, Manish Jethani wrote:
> > ... however, since we already don't support commas in the
> > value of the $csp option, we should also not support "$" in the value.
> 
> I can't tell if you're trolling or not at this point. If we don't care about
> slightly changing how existing filters are parsed why didn't we go with the
> original suggestion of doing something like `text.trim().replace(/ +/, " ")`?

Sorry, then I must have misunderstood. Are "$" supported in option values right
now? If they are not already supported, why do we care about supporting them
now? We already know that the value of the $csp option doesn't need to have "$"
in it.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 10:26:24, kzar wrote:
> On 2018/03/15 09:11:06, Manish Jethani wrote:
> > Actually I still don't get why we have to do this. We should run the
> > options regular expression on the unstripped text.
> 
> Then the options regexp might not match depending on the whitespace. Yes we
> could change the optionsRegexp to add an optional whitespace inbetween every
> character, but I don't want to do that since it's already huge and also it's
> used later in the common case for parsing regexp filters.

Sorry, yeah, I see what you mean. It wouldn't work.

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode202 lib/filterClasses.js:202: let optionsText = text; By the way, I was ...

March 15, 2018, 11:13 a.m. (2018-03-15 11:13:26 UTC) #22

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode204 lib/filterClasses.js:204: optionsText = optionsText.substr(optionsText.indexOf("$") + 1); On 2018/03/15 10:42:42, Manish ...

March 15, 2018, 11:25 a.m. (2018-03-15 11:25:48 UTC) #23

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 10:42:42, Manish Jethani wrote:
> On 2018/03/15 10:26:24, kzar wrote:
> > On 2018/03/15 08:09:59, Manish Jethani wrote:
> > > ... however, since we already don't support commas in the
> > > value of the $csp option, we should also not support "$" in the value.
> > 
> > I can't tell if you're trolling or not at this point. If we don't care about
> > slightly changing how existing filters are parsed why didn't we go with the
> > original suggestion of doing something like `text.trim().replace(/ +/, "
")`?
> 
> Sorry, then I must have misunderstood. Are "$" supported in option values
right
> now? If they are not already supported, why do we care about supporting them
> now? We already know that the value of the $csp option doesn't need to have
"$"
> in it.

OK, I think I understand what you mean.

  normalize("foo$domain=bar.com$")

This currently works, and it will continue to work because there's no "csp=" in
there.

  normalize("foo$csp=script-src 'self',domain=bar.com$")

This currently "works" (the $csp option isn't normalized correctly), and it'll
break if we only take the last "$" to be the starting point of the options text.
Well, so it's a "breaking" change, I agree, but what does it actually break?
There are no actual filters that would break because of this. It's only a
breaking change in theory. We can "officially" not support "$" in option values
from now on, and nobody would be affected by it. So if we are to be practical
I'd say let's just take the index of the last "$". If in the future there's a
case where an option value needs to have an actual "$", then we can switch to
more complicated code.

Let me know what you think.

kzar

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 15, 2018, 11:38 a.m. (2018-03-15 11:38:50 UTC) #24

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:190: // For most regexp filters we strip all whitespace,
but the values of $csp
On 2018/03/15 10:42:41, Manish Jethani wrote:
> I think Sebastian is referring to the rest of the code, i.e. the part where we
> already have the correct index of the options part.
> ...

So instead of splitting the options string by "," instead using substring and
indexOf(",") to pick out each option?

> In my tests I found that this was much faster than using 
> String.split, but arguably it makes the code harder to read.

Well fair enough, but I feel like optimising this rarely used code path at the
cost of code readability isn't worth it.

I would see the utility of moving the logic out into a generator function
*parseOptions which yields each option part however, since that could also be
used in the parsing code which is worth optimising. However if I got it right
using a generator here would make performance even worse than simply splitting
the string.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 10:42:42, Manish Jethani wrote:
> Sorry, then I must have misunderstood. Are "$" supported in option values
right
> now? If they are not already supported, why do we care about supporting them
> now? We already know that the value of the $csp option doesn't need to have
"$"
> in it.

Well the part of the filter following a "$" is considered to be the option
string if it matches the options regexp, otherwise it's not and that part is
considered to be a part of the filter's regexp. Take this example:

  somead$domain=valid.com|$invalid.com

So far it blocks requests matching /somead/ for the domain valid.com. With your
suggestion it would instead block requests matching
/somead\$domain=valid\.com\|\$invalid\.com/ for all domains.

FWIW I agree that this probably doesn't matter much in practice, but I have had
questions about these types of edge cases from users (vai Mapx) before. Take
this filter

  /$file/analytics.

IIRC they were confused why it worked, but one like /$file didn't. Well that's
since "$file" is considered a valid options string and not used for the regexp
where as $file/analytics" isn't.

kzar

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode204 lib/filterClasses.js:204: optionsText = optionsText.substr(optionsText.indexOf("$") + 1); On 2018/03/15 11:25:47, Manish ...

March 15, 2018, 11:43 a.m. (2018-03-15 11:43:53 UTC) #25

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 15, 2018, noon (2018-03-15 12:00:12 UTC) #26

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.j...
lib/filterClasses.js:190: // For most regexp filters we strip all whitespace,
but the values of $csp
On 2018/03/15 11:38:50, kzar wrote:
> On 2018/03/15 10:42:41, Manish Jethani wrote:
> > I think Sebastian is referring to the rest of the code, i.e. the part where
we
> > already have the correct index of the options part.
> > ...
> 
> So instead of splitting the options string by "," instead using substring and
> indexOf(",") to pick out each option?

Yes, it just turned out to be faster.

> > In my tests I found that this was much faster than using 
> > String.split, but arguably it makes the code harder to read.
> 
> Well fair enough, but I feel like optimising this rarely used code path at the
> cost of code readability isn't worth it.

I'm cool with using String.split here in the interested of
readability/maintainability.

> I would see the utility of moving the logic out into a generator function
> *parseOptions which yields each option part however, since that could also be
> used in the parsing code which is worth optimising. However if I got it right
> using a generator here would make performance even worse than simply splitting
> the string.

Yeah, I tested with the generator, it performs worse. String.split is faster.

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 11:43:52, kzar wrote:
> On 2018/03/15 11:25:47, Manish Jethani wrote:
> > We can "officially" not support "$" in option values
> > from now on... So if we are to be practical
> > I'd say let's just take the index of the last "$".
> 
> Sure or we could "officially" not support filters with random spaces inside
just
> the same. So we just do `.trim().replace(/ /+, " ")`, if the user gets lucky

But the two are not comparable, are they? If we don't support spaces, we break
filters that used to work; if we don't support "$" in a filter that _also_
contains "csp=" we break nothing.

A filter can contain spaces because the person who wrote it thought they could
include a space after the "$" or after each ",", I mean how do they know better?
That's not the case with "$", there's no reason to include a "$" in an option
value (can you think of any?). You would have to very deliberately add a "$" for
who-knows-why. (Note: a domain name can't contain "$")

> their filter still works, but if the space was in an option value name (or
> similar) things break. I think it's too late to talk about being practical in
> this codereview to be honest.

I'm not aware of the history of this normalize function, but I wouldn't expect
option names or value to contain spaces in practice. I suspect the code strips
out _all_ spaces instead of just spaces between option names and value is
because it's easier and faster to do text.replace(/ /g, "") than to split the
string up and remove spaces around each name and each value. If that's the case,
by all means let's just not care about spaces within option names and non-CSP
option values.

It's totally legit though to write a filter like "foo $ domain = bar.com " and
expect it to work.

Anyway, so if you feel strongly about this then I don't mind reverting to the
original text.replace(/ +/g, " ") though you'd be in a better position to say if
it would break anything (literally we'd be changing the output of the function
for potentially lots of actual filters out there).

> > ... and nobody would be affected by it.
> 
> Well that's speculation, hopefully few users would be anyway. But that also
> applies to my above point.

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode204 lib/filterClasses.js:204: optionsText = optionsText.substr(optionsText.indexOf("$") + 1); On 2018/03/15 12:00:11, Manish ...

March 15, 2018, 12:14 p.m. (2018-03-15 12:14:21 UTC) #27

kzar

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode204 lib/filterClasses.js:204: optionsText = optionsText.substr(optionsText.indexOf("$") + 1); On 2018/03/15 12:14:21, Manish ...

March 15, 2018, 1:13 p.m. (2018-03-15 13:13:34 UTC) #28

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode829 lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp)) Since we allow single spaces in the ...

March 15, 2018, 1:24 p.m. (2018-03-15 13:24:45 UTC) #29

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
Since we allow single spaces in the CSP, we should strip them here before
testing the regular expression:

  if (/(;|^)report-(to|uri)\b/.test(csp.replace(/ /g, "")))

Reason: even though the browsers that we know today might ignore something like
"r e p o r t-uri=http://example.com", we should cover the case where either by
design or by accident the browser starts accepting spaces here (browsers do tend
to be liberal in the kinds of inputs they'll accept and try to interpret
correctly). It would be a shame if because of a browser change somebody was able
to violate our users' privacy.

From the security and privacy point of view I have the following
recommendations:

 1.  It seems there's an outdated "referrer" directive [1]. This is bad for
privacy if exploited, similar to report-uri/report-to, in particular if it is
set to "unsafe-url". We should strip it out as well.

 2.  Since we don't know which directives in the future that would be introduced
in browser would be bad for privacy and security, this should be a whitelist
rather than a blacklist. We should allow the following directives only:
connect-src, default-src, font-src, frame-src, img-src, media-src, object-src,
script-src, style-src, base-uri, plugin-types, sandbox, form-action,
frame-ancestors, navigation-to, block-all-mixed-content, require-sri-for, and
upgrade-insecure-requests (note: no manifest-src, no worker-src).

 3.  'unsafe-eval' should be stripped out from the policy. I can't think of a
single reason why a filter author should be able to add 'unsafe-eval' to the
policy.

Further, none of the directives that are actually injected into the headers
should make the policy more liberal than what it would be otherwise. i.e. if the
response says "Content-Security-Policy: sandbox;" and we are about to inject
"Content-Security-Policy: sandbox allow-scripts;", we should not inject it. This
should be taken care of on the adblockpluschrome side.

[1]:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Po...

kzar

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode829 lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp)) On 2018/03/15 13:24:44, Manish Jethani wrote: > ...

March 15, 2018, 1:44 p.m. (2018-03-15 13:44:46 UTC) #30

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js#newcode204 lib/filterClasses.js:204: optionsText = optionsText.substr(optionsText.indexOf("$") + 1); On 2018/03/15 13:13:33, kzar ...

March 15, 2018, 1:47 p.m. (2018-03-15 13:47:34 UTC) #31

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29722910/lib/filterClasses.j...
lib/filterClasses.js:204: optionsText =
optionsText.substr(optionsText.indexOf("$") + 1);
On 2018/03/15 13:13:33, kzar wrote:
> On 2018/03/15 12:14:21, Manish Jethani wrote:
> > To sum this up, the normalize function looks good to me as it is, though we
> > could optimize further, but that's your call.
> > 
> > I'm looking at the other parts of this patch now.
> 
> Thanks OK. (If you can see how to optimise the common code path further I'm
> interested, but not for the csp code path unless it doesn't hurt readability
> further.)

Acknowledged.

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 13:44:45, kzar wrote:
> On 2018/03/15 13:24:44, Manish Jethani wrote:
> > Further, none of the directives that are actually injected into the headers
> > should make the policy more liberal than what it would be otherwise. i.e. if
> the
> > response says "Content-Security-Policy: sandbox;" and we are about to inject
> > "Content-Security-Policy: sandbox allow-scripts;", we should not inject it.
> This
> > should be taken care of on the adblockpluschrome side.
> 
> I don't think that's a problem, Sebastian noticed previously[1] that CSPs are
> combined exclusively[2] and so if one policy blocks something that trumps the
> other that allows the thing.

Thanks, I was just wondering about this. In that case we're OK.

By the way I wonder if whitelisting the directives (1) should be part of the
normalization step, or (2) should be here, or (3) should be one level up in the
stack at the time of header injection (to allow more flexibility).

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode829 lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp)) On 2018/03/15 13:47:33, Manish Jethani wrote: > ...

March 15, 2018, 1:50 p.m. (2018-03-15 13:50:48 UTC) #32

kzar

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode829 lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp)) On 2018/03/15 13:47:33, Manish Jethani wrote: > ...

March 15, 2018, 2:13 p.m. (2018-03-15 14:13:38 UTC) #33

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 13:47:33, Manish Jethani wrote:
> By the way I wonder if whitelisting the directives (1) should be part of the
> normalization step, or (2) should be here, or (3) should be one level up in
the
> stack at the time of header injection (to allow more flexibility).

It should be here at parse time IMO.

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 13:24:44, Manish Jethani wrote:
>  1.  It seems there's an outdated "referrer" directive [1]. This is bad for
> privacy if exploited, similar to report-uri/report-to, in particular if it is
> set to "unsafe-url". We should strip it out as well.

Well spotted, there's a non-outdated version called Referrer-Policy too[1]. I
think we should only allow "no-referrer" for both, since while omitting the
referrer information might be useful for adblocking anything else could be a
privacy concern.

> Since we allow single spaces in the CSP, we should strip them here before
> testing the regular expression:
> 
>   if (/(;|^)report-(to|uri)\b/.test(csp.replace(/ /g, "")))
> 
> Reason: even though the browsers that we know today might ignore something
like
> "r e p o r t-uri=http://example.com", we should cover the case where either by
> design or by accident the browser starts accepting spaces here (browsers do
tend
> to be liberal in the kinds of inputs they'll accept and try to interpret
> correctly). It would be a shame if because of a browser change somebody was
able
> to violate our users' privacy.
> 
>  2.  Since we don't know which directives in the future that would be
introduced
> in browser would be bad for privacy and security, this should be a whitelist
> rather than a blacklist. We should allow the following directives only:
> connect-src, default-src, font-src, frame-src, img-src, media-src, object-src,
> script-src, style-src, base-uri, plugin-types, sandbox, form-action,
> frame-ancestors, navigation-to, block-all-mixed-content, require-sri-for, and
> upgrade-insecure-requests

I disagree about stripping spaces in the off-chance that a browser introduces a
bug in the future. As for the directive whitelist I see your point, but I can
see arguments against. Do you have an opinion here Sebastian?

>  (note: no manifest-src, no worker-src).

I'm curious, why did you omit those? worker-src in particular could be useful
for preventing circumvention.

>  3.  'unsafe-eval' should be stripped out from the policy. I can't think of a
> single reason why a filter author should be able to add 'unsafe-eval' to the
> policy.

True there's not much point, but also there's not much harm. If our policy can't
allow anything that's already forbidden then who cares right?

[1] - https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referrer-Policy

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 13:50:48, Manish Jethani wrote:
> On 2018/03/15 13:47:33, Manish Jethani wrote:
> > By the way I wonder if whitelisting the directives (1) should be part of the
> > normalization step, or (2) should be here, or (3) should be one level up in
> the
> > stack at the time of header injection (to allow more flexibility).
> 
> Also, in the interest of landing this (so we can start playing with it), we
> could do the whitelisting/blacklisting part as a separate patch. Then we can
> take our time to decide where, what, and how. We'll just have to be careful
> about not accidentally shipping it as it is.

I'd prefer not to land this without the security / privacy concerns addressed.
If you want to play with it all I try to keep my feature branches up to date in
GitHub[1][2][3].

[1] - https://github.com/kzar/adblockpluschrome/tree/5241-csp-option
[2] - https://github.com/kzar/adblockpluscore/tree/6329-csp-option
[3] - https://github.com/kzar/adblockplusui/tree/6451-csp-filters

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode829 lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp)) On 2018/03/15 14:13:38, kzar wrote: > On ...

March 15, 2018, 2:28 p.m. (2018-03-15 14:28:32 UTC) #34

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 14:13:38, kzar wrote:
> On 2018/03/15 13:24:44, Manish Jethani wrote:
>
> > [...]
> >  (note: no manifest-src, no worker-src).
> 
> I'm curious, why did you omit those? worker-src in particular could be useful
> for preventing circumvention.

My only concern was to not allow filter authors to make a document's CSP more
permissive than it already is. Since that's apparently not possible, I'm OK with
these.

> >  3.  'unsafe-eval' should be stripped out from the policy. I can't think of
a
> > single reason why a filter author should be able to add 'unsafe-eval' to the
> > policy.
> 
> True there's not much point, but also there's not much harm. If our policy
can't
> allow anything that's already forbidden then who cares right?

Same as above. It's not an issue, we can allow it.

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 14:13:38, kzar wrote:
> On 2018/03/15 13:24:44, Manish Jethani wrote:
> >  1.  It seems there's an outdated "referrer" directive [1]. This is bad for
> > privacy if exploited, similar to report-uri/report-to, in particular if it
is
> > set to "unsafe-url". We should strip it out as well.
> 
> Well spotted, there's a non-outdated version called Referrer-Policy too[1]. I
> think we should only allow "no-referrer" for both, since while omitting the
> referrer information might be useful for adblocking anything else could be a
> privacy concern.

Referrer-Policy doesn't really concern us since we're not setting that header,
right? If I understand correctly, the $csp filter option is only supposed to
affect the Content-Security-Policy header. if the site sets its own
Referrer-Policy header then so be it, it's not on us.

kzar

On 2018/03/15 14:28:32, Manish Jethani wrote: > Referrer-Policy doesn't really concern us since we're not ...

March 15, 2018, 3:55 p.m. (2018-03-15 15:55:23 UTC) #35

kzar

Patch Set 10 : Prevent the referrer directive https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode828 lib/filterClasses.js:828: // ...

March 15, 2018, 4:09 p.m. (2018-03-15 16:09:47 UTC) #36

Manish Jethani

Looks good so far provided we decide to blacklist the three directives rather than whitelist ...

March 15, 2018, 4:55 p.m. (2018-03-15 16:55:34 UTC) #37

Sebastian Noack

https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29720596/lib/filterClasses.js#newcode190 lib/filterClasses.js:190: // For most regexp filters we strip all whitespace, ...

March 15, 2018, 5:25 p.m. (2018-03-15 17:25:41 UTC) #38

Sebastian Noack

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode829 lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp)) On 2018/03/15 14:13:38, kzar wrote: > I ...

March 15, 2018, 7:03 p.m. (2018-03-15 19:03:05 UTC) #39

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 14:13:38, kzar wrote:
> I disagree about stripping spaces in the off-chance that a browser introduces
a
> bug in the future.

Agreed. In the end, there is nothing we can effectively do to account for any
possibly future change in browser's behavior. And this particular change seems
especially obscure/unlikely. On the other hand, as more complexity we add, as
more likely we introduce bugs effecting current browser behavior.

> As for the directive whitelist I see your point, but I can
> see arguments against. Do you have an opinion here Sebastian?

The list of directives (relevant to ad blocking) will likely keep growing. So
maintaining a whitelist will be a permanent effort. Also it will keep filter
list authors from using the newest directives, as they have to wait for us to
release a new version with the directive added to the whitelist, and for users
to update their ad blocker.

Directives that could leak information are quite a special case and (unless
they change the semantics of report-uri/report-to again, which seems unlikely),
I don't see a case to add yet another one in the future. Though its still
possible. In the same way, it is possible that CSS selectors could leak
information in the future, yet we don't consider whitelisting safe CSS syntax.

Also it isn't that easy for an attacker to get a malicious filter to be used on
a target machine. The filter lists used by default, come from trusted sources
and we are in the control of the mirrors. So an attacker would have to host
their own filter list and then trick the user in subscribing to this filter
list. In the end, it might be more effective, while not much more difficult, to
trick the user into installing malware instead. I don't think that this
invalidates all security concerns about filter lists, but the attack vector
seems smaller than when dealing with input from most other external sources.

So I think a blacklist is appropriate and sufficient.

> I'm curious, why did you omit those? worker-src in particular could be useful
> for preventing circumvention.

Absolutely, we need worker-src. Abuse of SharedWorker/ServiceWorker for
circumvention (since they cannot be associated with the tab/frame they are
related to), was the main motivation for the $csp filter in the first place.

On 2018/03/15 13:24:44, Manish Jethani wrote:
>  3.  'unsafe-eval' should be stripped out from the policy. I can't think of a
> single reason why a filter author should be able to add 'unsafe-eval' to the
> policy.

There is a case for 'unsafe-eval'. If a website relies on dynamic code
evaluation, but you want to block other script sources that cannot be blocked by
other means, e.g. inline scripts, you would inject a policy like "script-src
http: 'unsafe-eval'". On the other hand, as already concluded, there is no harm
that can be caused with 'unsafe-eval' since the policy cannot make the document
more permissive, anyway.

On 2018/03/15 13:47:33, Manish Jethani wrote:
> By the way I wonder if whitelisting the directives (1) should be part of the
> normalization step, or (2) should be here, or (3) should be one level up in
the
> stack at the time of header injection (to allow more flexibility).

I think validation is preferable over normalization here. If we'd just strip
those directives silently, it will be more difficult to figure out why they
don't work, or worse filter list authors won't notice and add them to the filter
list anyway causing potential security issues with other ad blockers that are
less careful. If we mark the filter as invalid, however, this will trigger a
visible error message when adding it as a custom filter (which filter list
authors generally do first to test their filters).

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode202 lib/filterClasses.js:202: let optionsText = text; On 2018/03/15 11:13:25, Manish Jethani ...

March 16, 2018, 6:47 a.m. (2018-03-16 06:47:10 UTC) #40

Manish Jethani

The tests look good to me. https://codereview.adblockplus.org/29680689/diff/29723635/test/filterClasses.js File test/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723635/test/filterClasses.js#newcode317 test/filterClasses.js:317: compareFilter(test, "bla$csp=report-uri", ["type=invalid", ...

March 16, 2018, 8:54 a.m. (2018-03-16 08:54:41 UTC) #41

kzar

Patch Set 11 : Added test case for case insensitive forbidden directives Patch Set 12 ...

March 16, 2018, 11:29 a.m. (2018-03-16 11:29:06 UTC) #42

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode202 lib/filterClasses.js:202: let optionsText = text; On 2018/03/16 11:29:05, kzar wrote: ...

March 16, 2018, 12:05 p.m. (2018-03-16 12:05:06 UTC) #43

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:202: let optionsText = text;
On 2018/03/16 11:29:05, kzar wrote:
> On 2018/03/16 06:47:09, Manish Jethani wrote:
> 
> > Out of curiosity, what do you think about this? ...
> 
> Well I think that what we have already is plenty good enough, this is a rarely
> used code-path. While this kind of thing is interesting, the effort would be
> better spent elsewhere (e.g. removing the split when we parse the options
string

But the effort has already been spent, the code has already been written, right?
I mean there's nothing to do here but change a few lines of code.

It's a rarely used code path now, what if there's a new option like $csp that's
actually used very often? Then somebody may remove that /\bcsp=/i check in the
future, but they won't think about optimizing the part that follows because they
would assume it's already optimal (because why would the ABP team write
suboptimal code?).

> later). I mean we're expecting a user to have maybe 5 or 10 of these filters,
so
> with your numbers and being generous and saying 10 csp filters we're talking
1.6
> microseconds vs 1.2 microseconds here, or in other words a saving of 400
> nanoseconds.

This assumption about only 10 $csp filters may not hold in the future.

Honestly, I'm not seeing the downside of choosing code that's both more
efficient and easier to understand, but I'll leave it here.

> Since we're already playing code-golf I thought I'd test it[1], seems my
numbers
> match yours. Your approach is nearly a quarter faster and the difference seems
> to be down to the `split().length` operation rather than the `substr` calls.
> 
> [1] - https://jsperf.com/findoptionstext

The current version is 60% slower here on Chrome 65 and 76% slower on Firefox 58
(both running on macOS Sierra).

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode829 lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp)) On 2018/03/15 19:03:05, Sebastian Noack wrote: > ...

March 16, 2018, 12:25 p.m. (2018-03-16 12:25:06 UTC) #44

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:829: if (/(;|^)\s*report-(to|uri)\b/.test(csp))
On 2018/03/15 19:03:05, Sebastian Noack wrote:

> In the end, there is nothing we can effectively do to account for any
> possibly future change in browser's behavior. And this particular change seems
> especially obscure/unlikely. On the other hand, as more complexity we add, as
> more likely we introduce bugs effecting current browser behavior.

Ack.

> The list of directives (relevant to ad blocking) will likely keep growing. So
> maintaining a whitelist will be a permanent effort. Also it will keep filter
> list authors from using the newest directives, as they have to wait for us to
> release a new version with the directive added to the whitelist, and for users
> to update their ad blocker.

Maintaining a blacklist is also a permanent effort, isn't it? Unless we don't
care as much about security/privacy, which we do (?).

So we have two options:

 1.  Maintain a whitelist, let filter authors report missing directives ("It's
not working for me!"), add the directives as and when, and do a new release
after sufficient testing

 2.  Maintain a blacklist, let somebody report a security/privacy issue
(hopefully found on a beta version of the browser, but possibly later when the
browser is already stable and widely deployed and somebody has found a way to
exploit ABP and made the news), add the directive to the blacklist, do an
emergency patch release

CSP not working for filter authors is far more acceptable to me than a
vulnerability that's already being exploited in the wild (possibly without
anybody's knowledge) because of ABP.

> Directives that could leak information are quite a special case and (unless
> they change the semantics of report-uri/report-to again, which seems
unlikely),
> I don't see a case to add yet another one in the future. Though its still
> possible. In the same way, it is possible that CSS selectors could leak
> information in the future, yet we don't consider whitelisting safe CSS syntax.

The comparison with CSS is not fair, an HTTP header is far more powerful,
especially one that literally has "security" in its name
(Content-Security-Policy).

> Also it isn't that easy for an attacker to get a malicious filter to be used
on
> a target machine. The filter lists used by default, come from trusted sources
> and we are in the control of the mirrors. So an attacker would have to host
> their own filter list and then trick the user in subscribing to this filter
> list. In the end, it might be more effective, while not much more difficult,
to
> trick the user into installing malware instead. I don't think that this
> invalidates all security concerns about filter lists, but the attack vector
> seems smaller than when dealing with input from most other external sources.

A filter doesn't have to be malicious for there to be a security issue, it can
be (and usually is) a genuine oversight.

But OK, it really depends on the tradeoffs, I see your point here.

> There is a case for 'unsafe-eval'. If a website relies on dynamic code
> evaluation, but you want to block other script sources that cannot be blocked
by
> other means, e.g. inline scripts, you would inject a policy like "script-src
> http: 'unsafe-eval'". On the other hand, as already concluded, there is no
harm
> that can be caused with 'unsafe-eval' since the policy cannot make the
document
> more permissive, anyway.

Ack.

> I think validation is preferable over normalization here. If we'd just strip
> those directives silently, it will be more difficult to figure out why they
> don't work, or worse filter list authors won't notice and add them to the
filter
> list anyway causing potential security issues with other ad blockers that are
> less careful. If we mark the filter as invalid, however, this will trigger a
> visible error message when adding it as a custom filter (which filter list
> authors generally do first to test their filters).

Ack.

Manish Jethani

On 2018/03/16 12:53:15, Manish Jethani wrote: > LGTM Sorry, I found another one to blacklist: ...

March 16, 2018, 3:14 p.m. (2018-03-16 15:14:10 UTC) #46

Manish Jethani

On 2018/03/16 15:14:10, Manish Jethani wrote: > On 2018/03/16 12:53:15, Manish Jethani wrote: > > ...

March 16, 2018, 3:28 p.m. (2018-03-16 15:28:04 UTC) #47

sergei

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.js#newcode176 lib/filterClasses.js:176: text = text.replace(/[^\S ]+/g, ""); This change (including the ...

March 16, 2018, 4:49 p.m. (2018-03-16 16:49:26 UTC) #48

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:176: text = text.replace(/[^\S ]+/g, "");
This change (including the same changes below) seems unrelated and since it
seems to be a good optimization, if it's not difficult I would recommend to make
it in separate commit and it can be landed pretty quickly.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:179: if (/^ *!/.test(text))
This change seems unrelated too and to tell the truth it's not clear that it
should be done. It can be a good optimization but IMO it should be confirmed
that there are only spaces (0x20).

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:196: let optionsMatch =
Filter.optionsRegExp.exec(strippedText);
Since parsing time is very important could you please estimate how many times it
reaches this line and goes below for most popular subscriptions? Perhaps it can
be done as a part of hub#1020.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:196: let optionsMatch =
Filter.optionsRegExp.exec(strippedText);
I would like to carefully double check what is happening here and few lines
below, but have to run.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:804: domains = value.toUpperCase();
Is this change related to the issue? When I was profiling I discovered that we
are duplicating a lot of strings by changing their case at another stage, so
this change can be just a good optimization. The changing the case here can
result that instead of creating several string objects containing the same data
we can create a string object with the data once and other string objects will
be just a references to that original one.

If it's not related could you please make it in a separate commit?

https://codereview.adblockplus.org/29680689/diff/29724561/test/filterClasses.js
File test/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29724561/test/filterClasses....
test/filterClasses.js:422: "ads");
Although perhaps it's correct, currently it's not supposed that a filter text is
spread over several lines, though the can be some white spacing symbols like
'\n' either before the actual filter text or after it. If taking it into account
will result in more efficient code then I would recommend to do it because not
fast enough parsing of filters is a pretty serious issue currently on mobile
platforms.

kzar

Patch Set 13 : Added more tests, blacklisted directives and added a tiny optimisation > ...

March 17, 2018, 2:35 p.m. (2018-03-17 14:35:08 UTC) #49

Patch Set 13 : Added more tests, blacklisted directives and added a tiny
optimisation

> Sorry, I found another one to blacklist: base-uri.
> Another one: upgrade-insecure-requests.

Done.

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:202: let optionsText = text;
On 2018/03/16 12:05:06, Manish Jethani wrote:
> It's a rarely used code path now, what if there's a new option like $csp
that's
> actually used very often?

Then we have bigger problems, e.g. we ran the options regexp on the string twice
for those filters.

> This assumption about only 10 $csp filters may not hold in the future.

Wasn't an assumption, I looked at the uBlock filter list which is where this
filter option is already being used. OK I got it wrong and it's 24 not 10... but
sure perhaps that number will change in the future.

> Honestly, I'm not seeing the downside of choosing code that's both more
> efficient and easier to understand, but I'll leave it here.

You said yourself that it's subjective which is easier to understand and FWIW I
don't agree with you here. Either way the downside is that time spent talking
endlessly about something that *does not matter* is wasted, there's an
opportunity cost associated - we could have spent that time talking about how to
optimise a code path that does matter instead.

> The current version is 60% slower here on Chrome 65 and 76% slower on Firefox
58
> (both running on macOS Sierra).

I can't reproduce that, I see it as almost 25% slower.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:176: text = text.replace(/[^\S ]+/g, "");
On 2018/03/16 16:49:25, sergei wrote:
> This change (including the same changes below) seems unrelated and since it
> seems to be a good optimization, if it's not difficult I would recommend to
make
> it in separate commit and it can be landed pretty quickly.

I would rather just land this all together (same below).

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:179: if (/^ *!/.test(text))
On 2018/03/16 16:49:24, sergei wrote:
> ... it's not clear that it should be done. It can be a good 
> optimization but IMO it should be confirmed that there are only spaces 
> (0x20).

It is the case since we've already replaced other whitespace above.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:196: let optionsMatch =
Filter.optionsRegExp.exec(strippedText);
On 2018/03/16 16:49:25, sergei wrote:
> Since parsing time is very important could you please estimate how many times
it
> reaches this line and goes below for most popular subscriptions? Perhaps it
can
> be done as a part of hub#1020.

None in EasyList, 24 in the uBlock filter list[1].

[1] -
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters...

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:804: domains = value.toUpperCase();
On 2018/03/16 16:49:24, sergei wrote:
> Is this change related to the issue?

Yes.

https://codereview.adblockplus.org/29680689/diff/29725558/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29725558/lib/filterClasses.j...
lib/filterClasses.js:220: for (let i = -2; i != -1; i =
beforeOptions.indexOf("$", i + 1))
Here you go Manish, I still don't see the point personally.

(There's another possible optimisation, the first optionsText.indexOf("$") could
start at beforeOptions.indexOf("$") or beforeOptions.length if that's -1. I
couldn't see a way to do that in practice that sped things up much however and
it hurt readability further.)

sergei

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode202 lib/filterClasses.js:202: let optionsText = text; On 2018/03/17 14:35:06, kzar wrote: ...

March 19, 2018, 3:47 p.m. (2018-03-19 15:47:32 UTC) #50

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:202: let optionsText = text;
On 2018/03/17 14:35:06, kzar wrote:
> On 2018/03/16 12:05:06, Manish Jethani wrote:
> > It's a rarely used code path now, what if there's a new option like $csp
> that's
> > actually used very often?
> 
> Then we have bigger problems, e.g. we ran the options regexp on the string
twice
> for those filters.
> 
> > This assumption about only 10 $csp filters may not hold in the future.
> 
> Wasn't an assumption, I looked at the uBlock filter list which is where this
> filter option is already being used. OK I got it wrong and it's 24 not 10...
but
> sure perhaps that number will change in the future.
> 
> > Honestly, I'm not seeing the downside of choosing code that's both more
> > efficient and easier to understand, but I'll leave it here.
> 
> You said yourself that it's subjective which is easier to understand and FWIW
I
> don't agree with you here. Either way the downside is that time spent talking
> endlessly about something that *does not matter* is wasted, there's an
> opportunity cost associated - we could have spent that time talking about how
to
> optimise a code path that does matter instead.
> 
> > The current version is 60% slower here on Chrome 65 and 76% slower on
Firefox
> 58
> > (both running on macOS Sierra).
> 
> I can't reproduce that, I see it as almost 25% slower.

I actually agree with Manish here, in both the performance concerns and the
readability. In particular the `split`ing and `substring`s are way less
efficient than just indexes. One has to figure out that the trick with split is
actually to merely count the number of occurrence of the dollar sign and it also
requires to pay attention where the stripped string and the original one.

Since right now it's not an issue on practice I would not insist on anything
what blocks the change from landing. However could someone please create the
issue with the concerns on https://issues.adblockplus.org/? Additionally I'm
recalling hub#7471, perhaps we could measure the performance of emscripten
parsing code as the part of this issue and if it makes sense to bring from the
emscripten merely the parsing functionality instead of full filter classes and
Co. JIC, I would like to remind that not fast enough parsing of the filters is
currently a serious issue for us.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:176: text = text.replace(/[^\S ]+/g, "");
On 2018/03/17 14:35:06, kzar wrote:
> On 2018/03/16 16:49:25, sergei wrote:
> > This change (including the same changes below) seems unrelated and since it
> > seems to be a good optimization, if it's not difficult I would recommend to
> make
> > it in separate commit and it can be landed pretty quickly.
> 
> I would rather just land this all together (same below).

OK, however if it gets a bit longer then could you please do it in a separate
commit, additionally IMO this as well as perhaps some other changes are worth
mentioning in the commit message.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:179: if (/^ *!/.test(text))
On 2018/03/17 14:35:07, kzar wrote:
> On 2018/03/16 16:49:24, sergei wrote:
> > ... it's not clear that it should be done. It can be a good 
> > optimization but IMO it should be confirmed that there are only spaces 
> > (0x20).
> 
> It is the case since we've already replaced other whitespace above.

Acknowledged.

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:196: let optionsMatch =
Filter.optionsRegExp.exec(strippedText);
On 2018/03/17 14:35:07, kzar wrote:
> On 2018/03/16 16:49:25, sergei wrote:
> > Since parsing time is very important could you please estimate how many
times
> it
> > reaches this line and goes below for most popular subscriptions? Perhaps it
> can
> > be done as a part of hub#1020.
> 
> None in EasyList, 24 in the uBlock filter list[1].
> 
> [1] -
>
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters...

Acknowledged. BTW, now it also additionally executes
`!strippedText.includes("$")` and if it includes then`
!/\bcsp=/i.test(strippedText)` for every filter. Fortunately `Filter.normalize`
seems called not each time when stored filters are read, so it should not be
crucial but still important to know because it can affect the first time run, in
particular on mobile platforms.

kzar

Patch Set 14 : Go with Manish's do ... while ... suggestion https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js ...

March 19, 2018, 4:38 p.m. (2018-03-19 16:38:59 UTC) #51

Patch Set 14 : Go with Manish's do ... while ... suggestion

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:202: let optionsText = text;
On 2018/03/19 15:47:30, sergei wrote:
> I actually agree with Manish here, in both the performance concerns 
> and the readability.

Well I still disagree on the readability aspect, but it seems I'm now
outnumbered there. Done.

> In particular the `split`ing and `substring`s are way less
> efficient than just indexes.

That's half true. `.split("$").length` is slower than the  `indexOf("$", ...)`
approach. But I did not find using `substring` made much of a difference at all.
As of Patch Set 13 (which still used substring) the performance was approx 2%
slower than Manish's suggestion.

> I would like to remind that not fast enough parsing of the filters
> is currently a serious issue for us.

Well IMO if this is a concern we'd be better off:

1.  Making practical compromises with filter normalisation / parsing. For
example just doing `.replace(/ +/, " ").trim()` at normalisation instead of all
this $csp related stuff. See my point of view here[1]. We could also go with
Manish's suggestion of remove support for option values containing "$"s if it
helps.
2. Focusing on more critical parts of the code, e.g. how we do a .split of the
options string in the fromText function.

If you agree about point 1 let us know which parts are especially painful for
performance, perhaps there are other small tweaks with what we allow that could
improve performance further.

[1] - https://codereview.adblockplus.org/29680684/#msg7

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29724561/lib/filterClasses.j...
lib/filterClasses.js:196: let optionsMatch =
Filter.optionsRegExp.exec(strippedText);
On 2018/03/19 15:47:31, sergei wrote:
> BTW, now it also additionally execute `!strippedText.includes("$")`
> and if it includes then` !/\bcsp=/i.test(strippedText)` for every 
> filter.

Yes I know.

Manish Jethani

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js#newcode202 lib/filterClasses.js:202: let optionsText = text; On 2018/03/17 14:35:06, kzar wrote: ...

March 19, 2018, 6:03 p.m. (2018-03-19 18:03:40 UTC) #52

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29723593/lib/filterClasses.j...
lib/filterClasses.js:202: let optionsText = text;
On 2018/03/17 14:35:06, kzar wrote:
> On 2018/03/16 12:05:06, Manish Jethani wrote:
> > It's a rarely used code path now, what if there's a new option like $csp
> that's
> > actually used very often?
> 
> Then we have bigger problems, e.g. we ran the options regexp on the string
twice
> for those filters.
> 
> > This assumption about only 10 $csp filters may not hold in the future.
> 
> Wasn't an assumption, I looked at the uBlock filter list which is where this
> filter option is already being used. OK I got it wrong and it's 24 not 10...
but
> sure perhaps that number will change in the future.
> 
> > Honestly, I'm not seeing the downside of choosing code that's both more
> > efficient and easier to understand, but I'll leave it here.
> 
> You said yourself that it's subjective which is easier to understand and FWIW
I
> don't agree with you here. Either way the downside is that time spent talking
> endlessly about something that *does not matter* is wasted, there's an
> opportunity cost associated - we could have spent that time talking about how
to
> optimise a code path that does matter instead.
> 
> > The current version is 60% slower here on Chrome 65 and 76% slower on
Firefox
> 58
> > (both running on macOS Sierra).
> 
> I can't reproduce that, I see it as almost 25% slower.

Acknowledged.

https://codereview.adblockplus.org/29680689/diff/29725558/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29725558/lib/filterClasses.j...
lib/filterClasses.js:220: for (let i = -2; i != -1; i =
beforeOptions.indexOf("$", i + 1))
On 2018/03/17 14:35:08, kzar wrote:
> Here you go Manish, I still don't see the point personally.
> 
> (There's another possible optimisation, the first optionsText.indexOf("$")
could
> start at beforeOptions.indexOf("$") or beforeOptions.length if that's -1. I
> couldn't see a way to do that in practice that sped things up much however and
> it hurt readability further.)

Acknowledged.

https://codereview.adblockplus.org/29680689/diff/29727600/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29727600/lib/filterClasses.j...
lib/filterClasses.js:107: "base-uri", "referrer", "report-to", "report-uri",
"base-uri",
base-uri is repeated here.

kzar

Patch Set 15 : Removed duplicate base-uri directive from blacklist https://codereview.adblockplus.org/29680689/diff/29727600/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29727600/lib/filterClasses.js#newcode107 ...

March 19, 2018, 6:16 p.m. (2018-03-19 18:16:22 UTC) #53

Sebastian Noack

https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.js#newcode108 lib/filterClasses.js:108: ]); I don't get why we create this Set ...

March 21, 2018, 4:23 p.m. (2018-03-21 16:23:14 UTC) #55

kzar

Patch Set 16 : Addressed Sebastian's feedback https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.js File lib/filterClasses.js (right): https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.js#newcode108 lib/filterClasses.js:108: ]); On ...

March 21, 2018, 4:58 p.m. (2018-03-21 16:58:12 UTC) #56

Patch Set 16 : Addressed Sebastian's feedback

https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.js
File lib/filterClasses.js (right):

https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.j...
lib/filterClasses.js:108: ]);
On 2018/03/21 16:23:13, Sebastian Noack wrote:
> I don't get why we create this Set here, just to turn it into an array, to
turn
> it into a string to insert into regular expression, below.
> 
> IMO, better: Just use an array here, reducing the boilerplate.
> 
> IMO, even better: Just use a regular expression literal below. Note that
ESLint
> won't complain about long lines due to regular expression literals.

I created the Set since I figured being about to do
`cspBlacklist.has(directive)` might be useful sometime, but you're right it's
better to keep things simple. Done.

https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.j...
lib/filterClasses.js:809: csp.push(value);
On 2018/03/21 16:23:13, Sebastian Noack wrote:
> Does that mean that if multiple $csp options are given in the same 
> filter that they are merged?

Right.

> Any reason for these semantics... This seems inconsistent with the 
> $domain option... you can just separate them by semicolon within the
> same option value

Well I listed the inability of specifying multiple $csp values for a filter and
the inability to have two $csp filters both apply at the same time in the issue
as known limitations. Manish asked me why we have those limitations, and while
the latter was kind of a pain the former seemed easy enough and so I did it. But
your comment here made me reconsider, I've removed this logic again now.

https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.j...
lib/filterClasses.js:847: csp = csp.join("; ").toLowerCase();
On 2018/03/21 16:23:13, Sebastian Noack wrote:
> If we convert the CSP to lower case, shouldn't this rather happen 
> during normalization?

Yes and no.

While I see your logic of course we currently do not normalise the case of the
filter text at normalisation time at all. So if I was to normalise the case of
the $csp option there I would argue we should otherwise normalise the case of
filters there too.

(Yes, it's true that Filter.normalize does not fully normalise filters, that was
part of my argument[1] for not worrying as much about how we normalise
whitespace.)

[1] - https://codereview.adblockplus.org/29680684/#msg7

https://codereview.adblockplus.org/29680689/diff/29727620/lib/filterClasses.j...
lib/filterClasses.js:847: csp = csp.join("; ").toLowerCase();
On 2018/03/21 16:23:13, Sebastian Noack wrote:
> Is it guaranteed that all parts of a CSP are case-insensitive?

Hmm good point, initially I converted the value to lower case here since we
previously converted it to upper case which broke everything. But since I
adjusted that logic we could instead leave the case alone I suppose. Done.

LGTM

Issue 29680689: [$csp2 adblockpluscore] Issue 6329 - Add the CSP filter type (Closed)

Description

Patch Set 1 #

Patch Set 2 : Rebased, brought back Filter.normalize, other changes #

Patch Set 3 : Check for report-to and report-uri directives more carefully #

Patch Set 4 : Fully normalise whitespace, avoid strict equality #

Patch Set 5 : New approach, added more unit tests #

Patch Set 6 : Use space literal instead of \s where possible #

Patch Set 7 : Strip whitespace in batches where possible #

Patch Set 8 : Addressed some of Manish's feedback #

Patch Set 9 : Addressed some more nits #

Patch Set 10 : Prevent the referrer directive #

Patch Set 11 : Added test case for case insensitive forbidden directives #

Patch Set 12 : Improve comments #

Patch Set 13 : Added more tests, blacklisted directives and added a tiny optimisation #

Patch Set 14 : Go with Manish's do ... while ... suggestion #

Patch Set 15 : Removed duplicate base-uri directive from blacklist #

Patch Set 16 : Addressed Sebastian's feedback #

Messages