|
|
Created:
Sept. 18, 2014, 3:10 p.m. by Thomas Greiner Modified:
Sept. 25, 2014, 9:50 a.m. Visibility:
Public. |
DescriptionIssue 1393 - Block element dialog suggests decoded URLs
Patch Set 1 #
Total comments: 6
Patch Set 2 : #
Total comments: 2
Patch Set 3 : #
Total comments: 8
MessagesTotal messages: 19
LGTM
http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... include.postload.js:390: url = encodeURI(url); What if the URL we get is already encoded? This function will turn "%20" into "%2520".
http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... include.postload.js:390: url = encodeURI(url); On 2014/09/18 18:53:02, Wladimir Palant wrote: > What if the URL we get is already encoded? This function will turn "%20" into > "%2520". Thomas, can you please explain / figure out why this is necessary anyway? I just realized that the URL comes form Element.src and therefore should always already be quoted.
The problem with retrieving the value from Element.src is that we're getting a whitespace instead of "%20" in the URL. That means that if someone uses the block element feature on an image whose URL contains a whitespace that the filter won't match because `" " != "%20"`. Therefore not encoding it - but also encoding it twice or more as Wladimir pointed out - results in a filter that doesn't match the specified resource. On 2014/09/19 10:13:31, Sebastian Noack wrote: > http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... > File include.postload.js (right): > > http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... > include.postload.js:390: url = encodeURI(url); > On 2014/09/18 18:53:02, Wladimir Palant wrote: > > What if the URL we get is already encoded? This function will turn "%20" into > > "%2520". > > Thomas, can you please explain / figure out why this is necessary anyway? I just > realized that the URL comes form Element.src and therefore should always already > be quoted.
http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... include.postload.js:390: url = encodeURI(url); Now I see what you mean. Element.src does give us the encoded URL but we're using Element.getAttribute("src") in the getElementURL() function instead which gives us the decoded URL. On 2014/09/19 10:13:32, Sebastian Noack wrote: > On 2014/09/18 18:53:02, Wladimir Palant wrote: > > What if the URL we get is already encoded? This function will turn "%20" into > > "%2520". > > Thomas, can you please explain / figure out why this is necessary anyway? I just > realized that the URL comes form Element.src and therefore should always already > be quoted.
http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... include.postload.js:390: url = encodeURI(url); On 2014/09/19 10:47:13, Thomas Greiner wrote: > Now I see what you mean. Element.src does give us the encoded URL but we're > using Element.getAttribute("src") in the getElementURL() function instead which > gives us the decoded URL. > > On 2014/09/19 10:13:32, Sebastian Noack wrote: > > On 2014/09/18 18:53:02, Wladimir Palant wrote: > > > What if the URL we get is already encoded? This function will turn "%20" > into > > > "%2520". > > > > Thomas, can you please explain / figure out why this is necessary anyway? I > just > > realized that the URL comes form Element.src and therefore should always > already > > be quoted. > Yes, .src always returns a quoted URL no matter how the URL was specified in the document. So the correct solution would be just using .src instead .getAttribute("src") here, I guess.
http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... include.postload.js:390: url = encodeURI(url); On 2014/09/19 11:14:50, Sebastian Noack wrote: > So the correct solution would be just using .src instead .getAttribute("src") > here, I guess. The other advantage here is that element.src returns the full URL and we don't have to apply our error-prone normalization to it. So maybe we only need to keep that normalization for objects/applets - no way to get the full URL from the browser there IMHO.
http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5629499534213120/incl... include.postload.js:390: url = encodeURI(url); On 2014/09/21 18:37:42, Wladimir Palant wrote: > On 2014/09/19 11:14:50, Sebastian Noack wrote: > > So the correct solution would be just using .src instead .getAttribute("src") > > here, I guess. > > The other advantage here is that element.src returns the full URL and we don't > have to apply our error-prone normalization to it. So maybe we only need to keep > that normalization for objects/applets - no way to get the full URL from the > browser there IMHO. I didn't find any reason against applying it to objects/applets-only so I moved it accordingly. However, I'm not sure what issue 328 is about. Do you happen to know where I can find it? On 2014/09/19 11:14:50, Sebastian Noack wrote: > On 2014/09/19 10:47:13, Thomas Greiner wrote: > > Now I see what you mean. Element.src does give us the encoded URL but we're > > using Element.getAttribute("src") in the getElementURL() function instead > which > > gives us the decoded URL. > > > > On 2014/09/19 10:13:32, Sebastian Noack wrote: > > > On 2014/09/18 18:53:02, Wladimir Palant wrote: > > > > What if the URL we get is already encoded? This function will turn "%20" > > into > > > > "%2520". > > > > > > Thomas, can you please explain / figure out why this is necessary anyway? I > > just > > > realized that the URL comes form Element.src and therefore should always > > already > > > be quoted. > > > > Yes, .src always returns a quoted URL no matter how the URL was specified in the > document. > > So the correct solution would be just using .src instead .getAttribute("src") > here, I guess. I observed the same behavior with .href and .getAttribute("href") and therefore changed that as well.
http://codereview.adblockplus.org/4567408790470656/diff/5750085036015616/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5750085036015616/incl... include.postload.js:349: // The URL is not always normalized, so do it here For reference: this is referring to https://code.google.com/p/adthwart/issues/detail?id=328 which doesn't really provide any info. The point here is really that we need to produce a proper URL and I'm not sure we even need a comment to explain this.
http://codereview.adblockplus.org/4567408790470656/diff/5750085036015616/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5750085036015616/incl... include.postload.js:349: // The URL is not always normalized, so do it here On 2014/09/22 09:57:03, Wladimir Palant wrote: > For reference: this is referring to > https://code.google.com/p/adthwart/issues/detail?id=328 which doesn't really > provide any info. The point here is really that we need to produce a proper URL > and I'm not sure we even need a comment to explain this. Done.
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); I just realized that Chrome preserves slashes in URLs, while normalizeURL() normalizes them (e.g. foo//bar -> foo/bar). So this patch will introduce an inconsistent behavior, since URLs retrieved via .src and .href aren't passed to normalizeURL() anymore.
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); Also unencoded special characters aren't encoded by our normalizeURL() function. So the original issue will still persist for <object> and <embed> elements. Maybe we should just implement normalizeURL() like following. This reduces complexity, is faster, and makes sure that the URL is normalized the same way the browser does. function normalizeURL(url) { var frame = document.createElement("iframe"); frame.src = url; return frame.src; }
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); On 2014/09/22 13:27:49, Sebastian Noack wrote: > I just realized that Chrome preserves slashes in URLs, while normalizeURL() > normalizes them (e.g. foo//bar -> foo/bar). > > So this patch will introduce an inconsistent behavior, since URLs retrieved via > .src and .href aren't passed to normalizeURL() anymore. The question is whether we need to have a well-formed URL because what we need is the URL the way it is requested by the browser. So if the browser requests "http://example.com//test.jpg" then we shouldn't normalize it to "http://example.com/test.jpg" since the filter wouldn't match. On 2014/09/22 13:37:36, Sebastian Noack wrote: > Also unencoded special characters aren't encoded by our normalizeURL() function. > So the original issue will still persist for <object> and <embed> elements. > > Maybe we should just implement normalizeURL() like following. This reduces > complexity, is faster, and makes sure that the URL is normalized the same way > the browser does. > > function normalizeURL(url) > { > var frame = document.createElement("iframe"); > frame.src = url; > return frame.src; > } True, maybe that's a necessary hack but I'd rather use <a> tags for that since they're generally more light-weight.
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); On 2014/09/22 15:08:28, Thomas Greiner wrote: > On 2014/09/22 13:27:49, Sebastian Noack wrote: > > I just realized that Chrome preserves slashes in URLs, while normalizeURL() > > normalizes them (e.g. foo//bar -> foo/bar). > > > > So this patch will introduce an inconsistent behavior, since URLs retrieved > via > > .src and .href aren't passed to normalizeURL() anymore. > > The question is whether we need to have a well-formed URL because what we need > is the URL the way it is requested by the browser. So if the browser requests > "http://example.com//test.jpg" then we shouldn't normalize it to > "http://example.com/test.jpg" since the filter wouldn't match. I don't say that the we should strip repeated slashes. I just say that we shouldn't introduce inconsistent behavior. And yes, preferable the behavior should be the default behavior of the browser. > On 2014/09/22 13:37:36, Sebastian Noack wrote: > > Also unencoded special characters aren't encoded by our normalizeURL() > function. > > So the original issue will still persist for <object> and <embed> elements. > > > > Maybe we should just implement normalizeURL() like following. This reduces > > complexity, is faster, and makes sure that the URL is normalized the same way > > the browser does. > > > > function normalizeURL(url) > > { > > var frame = document.createElement("iframe"); > > frame.src = url; > > return frame.src; > > } > > True, maybe that's a necessary hack I think this is less a hack than rolling our own implementation that that mimics the way the browser normalizes URLs. > but I'd rather use <a> tags for that since they're generally more light-weight. Probably that doesn't make any difference. Note that frames are neither loaded before the element is inserted into the document.
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); On 2014/09/22 15:25:16, Sebastian Noack wrote: > > but I'd rather use <a> tags for that since they're generally more > light-weight. > > Probably that doesn't make any difference. Note that frames are neither loaded > before the element is inserted into the document. Actually using an <iframe> element is notable faster (for some reason). Also caching the element gives a significant boost: http://jsperf.com/normailzeurl
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); On 2014/09/22 15:50:59, Sebastian Noack wrote: > Actually using an <iframe> element is notable faster (for some reason). Also > caching the element gives a significant boost: http://jsperf.com/normailzeurl That should be because Chrome does some caching of its own for frames, please compare: http://jsperf.com/normalizeurl So I'd be in favor of using the <a> element, if only to avoid the impression that something is being downloaded there. Note that even then we won't get it entirely correct: in case of the <object> and <applet> tags the URL resolution is being performed by the plugin, and it might use a different way to resolve URLs from what the browser does (I've hit that issue with Java before). Plus, there can be different base URLs in a document (see Node.baseURI). Still, this approach is a lot better than the current hack.
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); On 2014/09/22 17:39:21, Wladimir Palant wrote: > On 2014/09/22 15:50:59, Sebastian Noack wrote: > > Actually using an <iframe> element is notable faster (for some reason). Also > > caching the element gives a significant boost: http://jsperf.com/normailzeurl > > That should be because Chrome does some caching of its own for frames, please > compare: http://jsperf.com/normalizeurl > > So I'd be in favor of using the <a> element, if only to avoid the impression > that something is being downloaded there. Very interesting. So I agree with your point for using an <a> element here. Though <iframe> elements still seem to be faster when cached, while caching the <a> element doesn't make much a difference, on Chrome: http://jsperf.com/normalizeurl/2 However, caching the <a> element might still be worth for Safari, also considering that we have to resolve the URL for every request there. > Note that even then we won't get it entirely correct: in case of the <object> > and <applet> tags the URL resolution is being performed by the plugin, and it > might use a different way to resolve URLs from what the browser does (I've hit > that issue with Java before). Plus, there can be different base URLs in a > document (see Node.baseURI). Still, this approach is a lot better than the > current hack. That is true, but there is probably no way to resolve the URL correctly for every possible plugin. And reusing the built-in URL normalization of the browser, instead rolling our own, reduces at least complexity and is also faster.
http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... File include.postload.js (right): http://codereview.adblockplus.org/4567408790470656/diff/5634387206995968/incl... include.postload.js:349: url = normalizeURL(relativeToAbsoluteUrl(url)); On 2014/09/23 09:40:34, Sebastian Noack wrote: > On 2014/09/22 17:39:21, Wladimir Palant wrote: > > On 2014/09/22 15:50:59, Sebastian Noack wrote: > > > Actually using an <iframe> element is notable faster (for some reason). Also > > > caching the element gives a significant boost: > http://jsperf.com/normailzeurl > > > > That should be because Chrome does some caching of its own for frames, please > > compare: http://jsperf.com/normalizeurl > > > > So I'd be in favor of using the <a> element, if only to avoid the impression > > that something is being downloaded there. > > Very interesting. So I agree with your point for using an <a> element here. > Though <iframe> elements still seem to be faster when cached, while caching the > <a> element doesn't make much a difference, on Chrome: > http://jsperf.com/normalizeurl/2 > > However, caching the <a> element might still be worth for Safari, also > considering that we have to resolve the URL for every request there. > > > Note that even then we won't get it entirely correct: in case of the <object> > > and <applet> tags the URL resolution is being performed by the plugin, and it > > might use a different way to resolve URLs from what the browser does (I've hit > > that issue with Java before). Plus, there can be different base URLs in a > > document (see Node.baseURI). Still, this approach is a lot better than the > > current hack. > > That is true, but there is probably no way to resolve the URL correctly for > every possible plugin. And reusing the built-in URL normalization of the > browser, instead rolling our own, reduces at least complexity and is also > faster. LGTM. I created a separate issue, for using an <a> element, when explicitly resolving URLs: https://issues.adblockplus.org/ticket/1441 |