|
|
Created:
June 16, 2017, 5:25 p.m. by Manish Jethani Modified:
July 13, 2017, 9:56 a.m. Reviewers:
kzar CC:
Sebastian Noack Base URL:
https://hg.adblockplus.org/abp2blocklist Visibility:
Public. |
DescriptionIssue 5325 - Add support for separator characters
Patch Set 1 #
Total comments: 4
Patch Set 2 : Add special handling for separator at beginning #
Total comments: 13
Patch Set 3 : Wrap long lines #Patch Set 4 : Add comment about using only lower case for hostname #Patch Set 5 : Rebase #MessagesTotal messages: 12
https://codereview.adblockplus.org/29467595/diff/29467596/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29467596/lib/abp2blocklist.j... lib/abp2blocklist.js:154: regexp.push("([^.%A-Za-z0-9_].*)?$"); Can you put the duplicated part of the regexp in a variable? https://codereview.adblockplus.org/29467595/diff/29467596/lib/abp2blocklist.j... lib/abp2blocklist.js:155: canSafelyMatchAsLowercase = false; Why is that necessary?
Patch Set 2: Add special handling for separator at beginning https://codereview.adblockplus.org/29467595/diff/29467596/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29467596/lib/abp2blocklist.j... lib/abp2blocklist.js:154: regexp.push("([^.%A-Za-z0-9_].*)?$"); On 2017/06/16 21:13:18, Sebastian Noack wrote: > Can you put the duplicated part of the regexp in a variable? Done. https://codereview.adblockplus.org/29467595/diff/29467596/lib/abp2blocklist.j... lib/abp2blocklist.js:155: canSafelyMatchAsLowercase = false; On 2017/06/16 21:13:18, Sebastian Noack wrote: > Why is that necessary? It was converting "A-Z" into "a-z". I thought about this a little more, it's not necessary. Instead, we just remove the "A-Z" if the separator occurs at the end of the hostname. https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:152: if (i == 0) Support separators at the beginning of the filter text. This is in fact given as an example in the filters documentation, we need to support this case.
(Moving Sebastian to Cc.) https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; Like I mentioned in the issue shouldn't the period be escaped? https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; I don't understand the logic with justHostname here, could you explain? https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:152: if (i == 0) On 2017/06/19 10:39:54, Manish Jethani wrote: > Support separators at the beginning of the filter text. > > This is in fact given as an example in the filters documentation, we need to > support this case. Acknowledged. https://codereview.adblockplus.org/29467595/diff/29468560/test/abp2blocklist.js File test/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/test/abp2blocklist.... test/abp2blocklist.js:178: {trigger: {"url-filter": "^https?://([^/]+\\.)?example\\.com/path([^-_.%A-Za-z0-9].*)?$", Nit: Mind fixing these long lines where possible?
https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; On 2017/07/07 12:46:18, kzar wrote: > I don't understand the logic with justHostname here, could you explain? For filters like "example.co^" and "example.co^*^hello^, when the code encounters the first "^", justHostname is true. This means we're still in the hostname part, therefore there's no need to include the "A-Z", as hostnames are always lower case. Safari actually Punycode-encodes the hostname part before running the regular expression on it. We could just leave the "A-Z" there, it wouldn't hurt, but it's cleaner this way, especially because if the filter contains only the hostname part then the following code will covert the entire regular expression to lower case thus leaving us with "a-za-z". https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; On 2017/07/07 12:46:18, kzar wrote: > Like I mentioned in the issue shouldn't the period be escaped? I replied there as well. No, the period must not be escaped when it's inside square brackets. Within square brackets it has its literal meaning.
https://codereview.adblockplus.org/29467595/diff/29468560/test/abp2blocklist.js File test/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/test/abp2blocklist.... test/abp2blocklist.js:178: {trigger: {"url-filter": "^https?://([^/]+\\.)?example\\.com/path([^-_.%A-Za-z0-9].*)?$", On 2017/07/07 12:46:18, kzar wrote: > Nit: Mind fixing these long lines where possible? Acknowledged. I'll try to fix this.
Patch Set 3: Wrap long lines https://codereview.adblockplus.org/29467595/diff/29468560/test/abp2blocklist.js File test/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/test/abp2blocklist.... test/abp2blocklist.js:178: {trigger: {"url-filter": "^https?://([^/]+\\.)?example\\.com/path([^-_.%A-Za-z0-9].*)?$", On 2017/07/08 06:33:57, Manish Jethani wrote: > On 2017/07/07 12:46:18, kzar wrote: > > Nit: Mind fixing these long lines where possible? > > Acknowledged. > > I'll try to fix this. Done.
Otherwise LGTM https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; On 2017/07/08 06:29:10, Manish Jethani wrote: > On 2017/07/07 12:46:18, kzar wrote: > > Like I mentioned in the issue shouldn't the period be escaped? > > I replied there as well. No, the period must not be escaped when it's inside > square brackets. Within square brackets it has its literal meaning. Acknowledged. https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; On 2017/07/08 06:29:10, Manish Jethani wrote: > On 2017/07/07 12:46:18, kzar wrote: > > I don't understand the logic with justHostname here, could you explain? > > For filters like "example.co^" and "example.co^*^hello^, when the code > encounters the first "^", justHostname is true. This means we're still in the > hostname part, therefore there's no need to include the "A-Z", as hostnames are > always lower case. Safari actually Punycode-encodes the hostname part before > running the regular expression on it. > > We could just leave the "A-Z" there, it wouldn't hurt, but it's cleaner this > way, especially because if the filter contains only the hostname part then the > following code will covert the entire regular expression to lower case thus > leaving us with "a-za-z". Ah OK, thanks. Would you mind adding a comment to explain that?
Patch Set 4: Add comment about using only lower case for hostname https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; On 2017/07/10 12:39:56, kzar wrote: > On 2017/07/08 06:29:10, Manish Jethani wrote: > > On 2017/07/07 12:46:18, kzar wrote: > > > I don't understand the logic with justHostname here, could you explain? > > > > For filters like "example.co^" and "example.co^*^hello^, when the code > > encounters the first "^", justHostname is true. This means we're still in the > > hostname part, therefore there's no need to include the "A-Z", as hostnames > are > > always lower case. Safari actually Punycode-encodes the hostname part before > > running the regular expression on it. > > > > We could just leave the "A-Z" there, it wouldn't hurt, but it's cleaner this > > way, especially because if the filter contains only the hostname part then the > > following code will covert the entire regular expression to lower case thus > > leaving us with "a-za-z". > > Ah OK, thanks. Would you mind adding a comment to explain that? Done. I've also made the code a bit more verbose here so it's easier to understand what "separator" means.
LGTM, go ahead and push when you're ready. https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29467595/diff/29468560/lib/abp2blocklist.j... lib/abp2blocklist.js:151: let separator = "[^-_.%" + (justHostname ? "" : "A-Z") + "a-z0-9]"; On 2017/07/11 17:10:33, Manish Jethani wrote: > On 2017/07/10 12:39:56, kzar wrote: > > On 2017/07/08 06:29:10, Manish Jethani wrote: > > > On 2017/07/07 12:46:18, kzar wrote: > > > > I don't understand the logic with justHostname here, could you explain? > > > > > > For filters like "example.co^" and "example.co^*^hello^, when the code > > > encounters the first "^", justHostname is true. This means we're still in > the > > > hostname part, therefore there's no need to include the "A-Z", as hostnames > > are > > > always lower case. Safari actually Punycode-encodes the hostname part before > > > running the regular expression on it. > > > > > > We could just leave the "A-Z" there, it wouldn't hurt, but it's cleaner this > > > way, especially because if the filter contains only the hostname part then > the > > > following code will covert the entire regular expression to lower case thus > > > leaving us with "a-za-z". > > > > Ah OK, thanks. Would you mind adding a comment to explain that? > > Done. > > I've also made the code a bit more verbose here so it's easier to understand > what "separator" means. Thanks, while a little verbose like you mention I think it's a lot easier to understand now.
Patch Set 5: Rebase There were some conflicts in the test file, resolved.
LGTM |