Issue 29336787: Issue 3670 - Make rules case sensitive where possible

https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerLists.js File lib/contentBlockerLists.js (right): https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerLists.js#newcode175 lib/contentBlockerLists.js:175: function caseSensitive(filter) It probably makes sense to integrate the ...

Feb. 21, 2016, 9:42 p.m. (2016-02-21 21:42:30 UTC) #2

kzar

Patch Set 2 : Rebased and addressed feedback https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerLists.js File lib/contentBlockerLists.js (right): https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerLists.js#newcode175 lib/contentBlockerLists.js:175: function ...

Feb. 24, 2016, 3:43 p.m. (2016-02-24 15:43:29 UTC) #3

Patch Set 2 : Rebased and addressed feedback

https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerL...
File lib/contentBlockerLists.js (right):

https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerL...
lib/contentBlockerLists.js:175: function caseSensitive(filter)
On 2016/02/21 21:42:30, Sebastian Noack wrote:
> It probably makes sense to integrate the logic here into the state machine
that
> converts the filter to a regular expression.
> 
> Also note that we have to convert filters that doesn't explicitly use the
> $match_case option to lowercase. Otherwise, filters like ||EXAMPLE.COM would
> never match, while they previously did match requests like
> "https://example.com".

Done.

https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerL...
lib/contentBlockerLists.js:180: if (!(filter.text.startsWith("||") ||
filter.text.startsWith("://")))
On 2016/02/21 21:42:30, Sebastian Noack wrote:
> Filters usually don't start with "://". But filters like http://adserver.com
are
> supposed to be marked case sensitive.
> 

Done.

https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerL...
lib/contentBlockerLists.js:184: let boundary =
filter.text.substr(offset).search(/\/\?\*\^/);
On 2016/02/21 21:42:30, Sebastian Noack wrote:
> This regular expression would only match a seqeunce of "/?*^". However, we are
> looking for any of these charecters. It seems you forgot the brackets. But as
I
> said in the comment above, this logic should be rather integrated in the state
> machine that we have in place, anyway.

(Yep, forgot the brackets.)

https://codereview.adblockplus.org/29336787/diff/29336788/lib/contentBlockerL...
lib/contentBlockerLists.js:385: trigger: {"url-filter": matchDomain,
On 2016/02/21 21:42:30, Sebastian Noack wrote:
> Same for element hiding filters; we have to make sure to convert the host to
> lowercase.

Done.

kzar

Patch Set 3 : Fix edge case for filters ending with "://"

Feb. 24, 2016, 3:53 p.m. (2016-02-24 15:53:34 UTC) #4

Sebastian Noack

https://codereview.adblockplus.org/29336787/diff/29337682/lib/contentBlockerList.js File lib/contentBlockerList.js (right): https://codereview.adblockplus.org/29336787/diff/29337682/lib/contentBlockerList.js#newcode112 lib/contentBlockerList.js:112: if (containsHostname && i < lastIndex) I think caseSensitive ...

Feb. 24, 2016, 8:31 p.m. (2016-02-24 20:31:05 UTC) #5

kzar

Patch Set 4 : Keep rules with non alpha characters after hostname ends case sensitive ...

Feb. 24, 2016, 9:20 p.m. (2016-02-24 21:20:48 UTC) #6

Sebastian Noack

https://codereview.adblockplus.org/29336787/diff/29337694/lib/contentBlockerList.js File lib/contentBlockerList.js (right): https://codereview.adblockplus.org/29336787/diff/29337694/lib/contentBlockerList.js#newcode113 lib/contentBlockerList.js:113: if (!hostnameStarted && i >= 2 && text[i-2] == ...

Feb. 24, 2016, 9:47 p.m. (2016-02-24 21:47:07 UTC) #7

kzar

Patch Set 6 : Fixed mistake with logic for * and ^, addressed other feedback ...

Feb. 24, 2016, 10:19 p.m. (2016-02-24 22:19:46 UTC) #8

Patch Set 6 : Fixed mistake with logic for * and ^, addressed other feedback

https://codereview.adblockplus.org/29336787/diff/29337694/lib/contentBlockerL...
File lib/contentBlockerList.js (right):

https://codereview.adblockplus.org/29336787/diff/29337694/lib/contentBlockerL...
lib/contentBlockerList.js:113: if (!hostnameStarted && i >= 2 && text[i-2] ==
":" && text[i-1] == "/")
On 2016/02/24 21:47:06, Sebastian Noack wrote:
> Note that when you use charAt() as in my initial suggestion you shouldn't have
> to bother about out of bound reads, making the i>=2 check redundant.

Done.

https://codereview.adblockplus.org/29336787/diff/29337694/lib/contentBlockerL...
lib/contentBlockerList.js:117: break;
On 2016/02/24 21:47:07, Sebastian Noack wrote:
> If I get the logic here straight, there is an inconsistency: If we encounter
> "://" the last slash wouldn't be escaped as we bail out here. However, for all
> other occurences of slashes we would fall through, and eventually escape it,
> right? But we shouldn't escape slashes no matter what.

No, for all other occurrences of slashes we fall through, but then push the
unescaped version as `c != "?"` is true.

I've done it this way because otherwise hostnameStarted will be set to true and
then immediately hostnameFinished will be set to true as well. (I'm not entirely
sure the change to use a fall through here helped, which is why I uploaded that
change as a separate patch set.)

In fact, after addressing your other feedback this logic makes no sense at all.
I've reworked it.

https://codereview.adblockplus.org/29336787/diff/29337694/lib/contentBlockerL...
lib/contentBlockerList.js:119: case "?": case "*": case "^":
On 2016/02/24 21:47:06, Sebastian Noack wrote:
> "*" and "^" are already handled and bail out above. So this code is never
> reached for these characters.

Done.

https://codereview.adblockplus.org/29336787/diff/29337694/lib/contentBlockerL...
lib/contentBlockerList.js:133: if (hostnameFinished && (c >= "a" && c <= "z" ||
c >= "A" && c <= "Z"))
On 2016/02/24 21:47:06, Sebastian Noack wrote:
> Note that this check can be simlified with a regexp: /[a-z]/i.test(c). But I
can
> understand if you don't want to use a regexp in a state machine. I don't have
a
> strong opinion either, so I leave it up to you. However, if you stick to the
> current logic, IMO it would be slightly easier to read like that:
> 
>   if (hostnameFinished && (c >= "a" && c <= "z" ||
>                            c >= "A" && c <= "Z"))

Done.

Sebastian Noack

https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerList.js File lib/contentBlockerList.js (right): https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerList.js#newcode111 lib/contentBlockerList.js:111: hostnameStarted = caseSensitive = true; Nit: Sometimes you have ...

Feb. 24, 2016, 10:53 p.m. (2016-02-24 22:53:49 UTC) #9

Sebastian Noack

https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerList.js File lib/contentBlockerList.js (right): https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerList.js#newcode161 lib/contentBlockerList.js:161: if (result.caseSensitive) On 2016/02/24 22:53:49, Sebastian Noack wrote: > ...

Feb. 24, 2016, 11:07 p.m. (2016-02-24 23:07:37 UTC) #10

kzar

Patch Set 7 : Addressed more feedback https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerList.js File lib/contentBlockerList.js (right): https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerList.js#newcode111 lib/contentBlockerList.js:111: hostnameStarted = ...

Feb. 24, 2016, 11:15 p.m. (2016-02-24 23:15:21 UTC) #11

Patch Set 7 : Addressed more feedback

https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerL...
File lib/contentBlockerList.js (right):

https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerL...
lib/contentBlockerList.js:111: hostnameStarted = caseSensitive = true;
On 2016/02/24 22:53:49, Sebastian Noack wrote:
> Nit: Sometimes you have the regular expression conversion logic first,
sometimes
> you have case sensitivity logic first. Mind doing it consistently?

Done.

https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerL...
lib/contentBlockerList.js:121: result.push("/");
On 2016/02/24 22:53:49, Sebastian Noack wrote:
> If we move this case just above the default case, we could simply pass
through,
> and wouldn't need |result.push("/");break;| here.

Done.

https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerL...
lib/contentBlockerList.js:126: case ".": case "+": case "$": case "{":
On 2016/02/24 22:53:48, Sebastian Noack wrote:
> Nit: It doesn't really matter, but you wrap after the 5th (instead 4th) case,
we
> have an equal number of cases per line, and the parantheses/braces/etc are on
> the same line as their corresponding counter part. ;)

Done.

https://codereview.adblockplus.org/29336787/diff/29337696/lib/contentBlockerL...
lib/contentBlockerList.js:161: if (result.caseSensitive)
On 2016/02/24 22:53:49, Sebastian Noack wrote:
> I think the logic here would be a little more straight forward if you put it
> like that:
> 
>   if (result.caseSensitive)
>     trigger["url-filter"] = trigger["url-filter"].toLowerCase();
> 
>   if (result.caseSensitive || filter.matchCase)
>     trigger["url-filter-is-case-sensitive"] = true;
> 
> Note that this also simplifies the calling code.
> 
> (Moreover, I'd personally use a variable for the regexp rather than
duplicating
> the object lookup, but I leave that up to you.)
> 

Done.

Sebastian Noack

https://codereview.adblockplus.org/29336787/diff/29337698/lib/contentBlockerList.js File lib/contentBlockerList.js (right): https://codereview.adblockplus.org/29336787/diff/29337698/lib/contentBlockerList.js#newcode68 lib/contentBlockerList.js:68: // Convert the "regexpSource" part of a filter's text ...

Feb. 25, 2016, 2:28 a.m. (2016-02-25 02:28:36 UTC) #13

kzar

Feb. 25, 2016, 3:22 p.m. (2016-02-25 15:22:29 UTC) #14

Patch Set 8 : Improved documentation of toRegexp function and switched to JSDoc
syntax

https://codereview.adblockplus.org/29336787/diff/29337698/lib/contentBlockerL...
File lib/contentBlockerList.js (right):

https://codereview.adblockplus.org/29336787/diff/29337698/lib/contentBlockerL...
lib/contentBlockerList.js:68: // Convert the "regexpSource" part of a filter's
text to a regular expression,
On 2016/02/25 02:28:35, Sebastian Noack wrote:
> Sorry for commenting after LGTM. Just one more nit: While I don't consider
> documentation for internal APIs mandatory, if you comment private functions,
> please still use JSDoc syntax.

Done.

Expand All Messages | Collapse All Messages