|
|
Created:
Feb. 27, 2016, 2:23 p.m. by kzar Modified:
March 8, 2016, 12:49 p.m. Reviewers:
Sebastian Noack Visibility:
Public. |
DescriptionIssue 3710 - Unify hostname logic
Patch Set 1 #
Total comments: 13
Patch Set 2 : Addressed feedback and considered crazy edge cases #
Total comments: 4
Patch Set 3 : Consider the case of | ending a hostname #
MessagesTotal messages: 8
Patch Set 1 https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.js File lib/abp2blocklist.js (left): https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:134: hostnameStarted = caseSensitive = true; (I've switched this around as I decided that sneaky fall-throughs in this big switch were making things more confusing.)
https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.js File lib/abp2blocklist.js (left): https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:118: result.push("https?://"); Nit: Mind moving that line to just above the break, for consistency? https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:95: if (c == "*" || c == "^" || c == "?" || c == "/" || i == lastIndex) If you turn the logic here the other way around, you don't need an else block. if (c != "*" && c != "^" && c != "?" && c != "/" && i < lastIndex) continue; https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:95: if (c == "*" || c == "^" || c == "?" || c == "/" || i == lastIndex) I'm not entirely sure if the case of last index is handled correctly. So if I get the logic right you'd consider just "example.co" part of the hostname in "||example.com". https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:136: result.push("/"); Nit: Mind moving that line to just above the break, for consistency? https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:137: if (!hostnameFinished && Nit: It doesn't matter, but I personally find that it reads slightly better if you have similar code aligned when wrapping: if (!hostnameFinished && text.charAt(i-2) == ":" && text.charAt(i-1) == "/") https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:144: case ".": case "+": case "$": case "{": case "}": case "?": Nit: I think the way this block was originally wrapped was a little nicer: case ".": case "+": case "$": case "?": case "{": case "}": case "(": case ")": case "[": case "]": case "\\": But I guess it's personal preference, so I leave it up you.
Patch Set 2 : Addressed feedback and considered crazy edge cases https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.js File lib/abp2blocklist.js (left): https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:118: result.push("https?://"); On 2016/02/27 20:30:38, Sebastian Noack wrote: > Nit: Mind moving that line to just above the break, for consistency? Done. https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:95: if (c == "*" || c == "^" || c == "?" || c == "/" || i == lastIndex) On 2016/02/27 20:30:38, Sebastian Noack wrote: > If you turn the logic here the other way around, you don't need an else block. > > if (c != "*" && c != "^" && c != "?" && c != "/" && i < lastIndex) > continue; Done. https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:95: if (c == "*" || c == "^" || c == "?" || c == "/" || i == lastIndex) On 2016/02/27 20:30:38, Sebastian Noack wrote: > I'm not entirely sure if the case of last index is handled correctly. So if I > get the logic right you'd consider just "example.co" part of the hostname in > "||example.com". Good point but it's even more complicated, what if we're at the last index but also the current character is one that would have ended a hostname?! I've done my best to keep the logic simple, but complexity crept in a bit. https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:136: result.push("/"); On 2016/02/27 20:30:38, Sebastian Noack wrote: > Nit: Mind moving that line to just above the break, for consistency? Done. https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:137: if (!hostnameFinished && On 2016/02/27 20:30:38, Sebastian Noack wrote: > Nit: It doesn't matter, but I personally find that it reads slightly better if > you have similar code aligned when wrapping: > > if (!hostnameFinished && text.charAt(i-2) == ":" > && text.charAt(i-1) == "/") I'd rather leave this one as it is. https://codereview.adblockplus.org/29337803/diff/29337804/lib/abp2blocklist.j... lib/abp2blocklist.js:144: case ".": case "+": case "$": case "{": case "}": case "?": On 2016/02/27 20:30:38, Sebastian Noack wrote: > Nit: I think the way this block was originally wrapped was a little nicer: > > case ".": case "+": case "$": case "?": > case "{": case "}": case "(": case ")": > case "[": case "]": case "\\": > > But I guess it's personal preference, so I leave it up you. Done.
https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.j... lib/abp2blocklist.js:95: let endingChar = (c == "*" || c == "^" || c == "?" || c == "/"); I know we didn't handle it before, but what's if we have a | in the last position. It implies the end of the URL, and therefore also end of the hostname.
https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.j... lib/abp2blocklist.js:95: let endingChar = (c == "*" || c == "^" || c == "?" || c == "/"); On 2016/02/27 23:06:16, Sebastian Noack wrote: > I know we didn't handle it before, but what's if we have a | in the last > position. It implies the end of the URL, and therefore also end of the hostname. So do we want to always consider "|" to end the hostname, or only when it also happens to be the last character?
https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.j... lib/abp2blocklist.js:95: let endingChar = (c == "*" || c == "^" || c == "?" || c == "/"); On 2016/03/07 17:06:48, kzar wrote: > On 2016/02/27 23:06:16, Sebastian Noack wrote: > > I know we didn't handle it before, but what's if we have a | in the last > > position. It implies the end of the URL, and therefore also end of the > hostname. > > So do we want to always consider "|" to end the hostname, or only when it also > happens to be the last character? I guess, for simplicity we can just assume that any "|" (except in the beginning) implies the end of the URL. It cannot occur in a domain, and if it occurs in the path/query it has to be escaped, anyway.
Patch Set 3 : Consider the case of | ending a hostname https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.js File lib/abp2blocklist.js (right): https://codereview.adblockplus.org/29337803/diff/29337816/lib/abp2blocklist.j... lib/abp2blocklist.js:95: let endingChar = (c == "*" || c == "^" || c == "?" || c == "/"); On 2016/03/08 09:31:22, Sebastian Noack wrote: > On 2016/03/07 17:06:48, kzar wrote: > > On 2016/02/27 23:06:16, Sebastian Noack wrote: > > > I know we didn't handle it before, but what's if we have a | in the last > > > position. It implies the end of the URL, and therefore also end of the > > hostname. > > > > So do we want to always consider "|" to end the hostname, or only when it also > > happens to be the last character? > > I guess, for simplicity we can just assume that any "|" (except in the > beginning) implies the end of the URL. It cannot occur in a domain, and if it > occurs in the path/query it has to be escaped, anyway. Yea, sounds good to me. Also we already know that we've passed the first character here as hostnameStart isn't null which makes things even easier.
LGTM |