|
|
Created:
Nov. 24, 2014, 3:22 p.m. by Sebastian Noack Modified:
Dec. 9, 2014, 11:53 a.m. Visibility:
Public. |
DescriptionIssue 1601 - Generate blocking filters for all URLs associated with the selected element
Patch Set 1 : #
Total comments: 13
Patch Set 2 : Also consider "poster" attribute #
Total comments: 2
Patch Set 3 : Addressed comments #Patch Set 4 : Simplified code #
Total comments: 13
Patch Set 5 : Rebased, addressed comments, and restructured code #
Total comments: 5
Patch Set 6 : Fixed wrong comparision operator #MessagesTotal messages: 17
Not finished looking through this one yet (finding it quite hard to grok everything) but here are my comments so far http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:395: var addSelector = function(selector) Why we don't just do "function addSelector(selector)" instead? http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:423: if (isHTTP) I found this line a little bit confusing to start with as you're checking if the URL is HTTP or HTTPS but it looked like you intended to just match HTTP. http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:481: urls.splice(i--, 1); Does removing an element from the array you're iterating over cause any problems in Javascript?
http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:395: var addSelector = function(selector) On 2014/11/25 18:10:11, kzar wrote: > Why we don't just do "function addSelector(selector)" instead? The only reason that comes to my mind would be that the function can not be called before the two variables above have been defined. Note that normal function definitions are automatically moved to the top of the scope. Not necessarily reason enough to leave it that way. However, those lines didn't change, and I rather don't want to put even more unrelated changes into this patch. http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:423: if (isHTTP) On 2014/11/25 18:10:11, kzar wrote: > I found this line a little bit confusing to start with as you're checking if the > URL is HTTP or HTTPS but it looked like you intended to just match HTTP. What I actually want, is excluding URLs that can't be handled by request blocking. So yes, both http:// and https:// are fine, however file://, data:, javascript: about:blank other special URLs aren't. I would consider isHTTPorHTTPS as variable name unnecessarily long. And since HTTPS is just an abbreviation for HTTP+TLS/SSL, it's still HTTP, and therefore I don't see any wrong implications here. http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:481: urls.splice(i--, 1); On 2014/11/25 18:10:11, kzar wrote: > Does removing an element from the array you're iterating over cause any problems > in Javascript? Not as long as you decrease the iteration counter (that i will have the same value during next iteration) as done here. Not that this is problamatic in any programming language, when iterating by index (as opposed to using iterators).
http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:481: urls.splice(i--, 1); On 2014/11/25 19:01:15, Sebastian Noack wrote: > On 2014/11/25 18:10:11, kzar wrote: > > Does removing an element from the array you're iterating over cause any > problems > > in Javascript? > > Not as long as you decrease the iteration counter (that i will have the same > value during next iteration) as done here. Not that this is problamatic in any > programming language, when iterating by index (as opposed to using iterators). s/is problamatic/isn't problamatic/
http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:544: function hasFilters(element) Maybe a name like "isFilterable" would be better than "hasFilters"? http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:544: function hasFilters(element) How come you don't put this function above where it's used? Wouldn't that be clearer? http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:552: if (/:.+:/.test(element.getAttribute("style"))) Maybe comment this line to explain what you told me? I don't think it's so obvious at first glance.
http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:544: function hasFilters(element) On 2014/11/26 10:51:22, kzar wrote: > Maybe a name like "isFilterable" would be better than "hasFilters"? As discussed on IRC, it's called "isBlockable" now. http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:544: function hasFilters(element) On 2014/11/26 10:51:22, kzar wrote: > How come you don't put this function above where it's used? Wouldn't that be > clearer? Since it relies on "getElementURLs()", I just put it below that function. I moved both function above now. http://codereview.adblockplus.org/5468762555809792/diff/5718998062727168/incl... include.postload.js:552: if (/:.+:/.test(element.getAttribute("style"))) On 2014/11/26 10:51:22, kzar wrote: > Maybe comment this line to explain what you told me? I don't think it's so > obvious at first glance. Well, this logic isn't new. But I guess I can add a comment though.
LGTM apart from the accidentally included bubbling / capturing changes from the other review. I think that palant should review this one though, the code is quite complex and the changes aren't trivial. http://codereview.adblockplus.org/5468762555809792/diff/5697982787747840/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5697982787747840/incl... include.postload.js:312: document.removeEventListener("keydown", clickHide_keyDown, true); I think the capturing / bubbling changes from the other review got included here accidentally?
http://codereview.adblockplus.org/5468762555809792/diff/5697982787747840/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5697982787747840/incl... include.postload.js:312: document.removeEventListener("keydown", clickHide_keyDown, true); On 2014/11/26 11:36:17, kzar wrote: > I think the capturing / bubbling changes from the other review got included here > accidentally? They are not part of this patch. However, they show up when you view the diff between the intial patch and this patch, since I had to rebase in between.
Frankly, I don't think that this effort is justified. To get all the possible URLs you have to duplicate much of the browser logic - and keep up with the changes. The current approach is a very simple approximation which is "good enough." IMHO the only proper solution would be associating webRequest calls with elements properly - that's what we do in Firefox. So what we show will be the calls related to the particular element. But I don't see how this can be done with Chrome right now.
I agree that associating webRequest calls with elements would be a better way, however as you already realized there seems to be no way doing that on Chrome and Safari. But as far as I can tell there are only following ways an element can configure URLs that are considered when loading that element: * By its "src" attribute * In case of an <object> element, either by its "data" attribute, or in case of Flash with a <param name="movie|src"> child element * In case of an <img> element, additionally or instead to the "src" attribute there can be a "srcset" attribute * In case of a <video>, <audio>, <picture> element, additionally or instead the "src" attribute there can be <source> child elements The current code already handles the former two cases, and this patch just adds the latter two, plus some logic to deal with multiple URLs. I think this is justified. Also the issue we address here is not only to handle multiple URLs, but also to generate any (blocking) filters for images using only the "srcset" attribute, as well as <videos> and <audio> elements using <source> elements.
http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:509: A fair amount of complexity here, only came from generating CSS selectors also for blocking filters in order to highlight the matching elements. However, the current element is highlighted anyway, and the generated selector doesn't necessarily match any other elements. So I simplified that code. Other changes are just due to rebase.
On 2014/11/26 13:19:17, Wladimir Palant wrote: > Frankly, I don't think that this effort is justified. To get all the possible > URLs you have to duplicate much of the browser logic - and keep up with the > changes. The current approach is a very simple approximation which is "good > enough." > > IMHO the only proper solution would be associating webRequest calls with > elements properly - that's what we do in Firefox. So what we show will be the > calls related to the particular element. But I don't see how this can be done > with Chrome right now. To be fair I think in this situation it probably is worth the added complexity. I think there's only an extra 70 lines, and the code added is generally cleaner than the stuff that's been removed. I think it's definitely right to question complexity, and often it's not necessary, but I think allowing users to block <video> elements is pretty important for example.
http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:187: var url = element.getAttribute("data"); For reference: <object> also has archive and codebase attributes. The handling is inconsistent and depends largely on the plugin used. I guess we'll just keep ignoring this... http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:218: continue; It seems that we need to handle <track> elements as well. http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:223: urls = urls.concat(parseSrcSet(child)); Rather than generate a new array, I'd prefer: urls.push.apply(urls, parseSrcSet(child)); http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:557: var url = urls[i].replace(/^\s+/, "").replace(/(\s+\S+)?\s*$/, ""); Use urls[i].trim()? According to http://kangax.github.io/compat-table/es5/#String.prototype.trim it is supported by at least Chrome 23. http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:559: urls[i] = resolveURL(url); What about descriptions? See http://html5hub.com/srcset-attribute-solving-responsive-image-dilemma/ for example - there is an optional description following the URL here.
Moreover, I restructured getURLsFromElement() into multiple functions for better readability. http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:187: var url = element.getAttribute("data"); On 2014/12/08 11:43:00, Wladimir Palant wrote: > For reference: <object> also has archive and codebase attributes. The handling > is inconsistent and depends largely on the plugin used. I guess we'll just keep > ignoring this... I agree, also note they are obsolete since HTML5, neither have I seen them being used in the past 10 years. So lets keep ignoring them for now. However, I added detection for Silverlight and Windows Media sources. http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:218: continue; On 2014/12/08 11:43:00, Wladimir Palant wrote: > It seems that we need to handle <track> elements as well. Done. http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:223: urls = urls.concat(parseSrcSet(child)); On 2014/12/08 11:43:00, Wladimir Palant wrote: > Rather than generate a new array, I'd prefer: > > urls.push.apply(urls, parseSrcSet(child)); Right, I forgot that push() can add multiple items to the array. http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:557: var url = urls[i].replace(/^\s+/, "").replace(/(\s+\S+)?\s*$/, ""); On 2014/12/08 11:43:00, Wladimir Palant wrote: > Use urls[i].trim()? According to > http://kangax.github.io/compat-table/es5/#String.prototype.trim it is supported > by at least Chrome 23. Done. http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:559: urls[i] = resolveURL(url); On 2014/12/08 11:43:00, Wladimir Palant wrote: > What about descriptions? See > http://html5hub.com/srcset-attribute-solving-responsive-image-dilemma/ for > example - there is an optional description following the URL here. Those are stripped by the regex above.
http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:187: var url = element.getAttribute("data"); On 2014/12/08 16:45:04, Sebastian Noack wrote: > neither have I seen them being > used in the past 10 years. Oh, you didn't have to deal with Java? :) http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... include.postload.js:245: if (child.localName != "source" || child.localName != "track") This should be == rather than !=, right? Currently this condition is always true. http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... include.postload.js:246: urls.push.apply(urls, getURLsFromAttributes(child)); I think for the <track> element you need to call getURLsFromMediaElement() recursively.
http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5196828823781376/incl... include.postload.js:187: var url = element.getAttribute("data"); On 2014/12/08 20:31:09, Wladimir Palant wrote: > On 2014/12/08 16:45:04, Sebastian Noack wrote: > > neither have I seen them being > > used in the past 10 years. > > Oh, you didn't have to deal with Java? :) I see, that stuff is for Java. I prefer to ignore it though, not only because Java in the browser is dead, but also because resolving the "archive" attribute relative to the "codebase" attribute is non-trivial. Currently we just create an <a> element assign a URL to its "href" attribute, and then retrieve the URL from its "href" attribute, normalized and resolved (relative to the document's URL), in 3 lines. An approach that wouldn't work with arbitrary base URLs. :( http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... include.postload.js:245: if (child.localName != "source" || child.localName != "track") On 2014/12/08 20:31:09, Wladimir Palant wrote: > This should be == rather than !=, right? Currently this condition is always > true. Ouch, you are right. Originally, it was in the form "(.. != .. && .. != ..) continue;". And I forgot to change the comparison operator as well. http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... include.postload.js:246: urls.push.apply(urls, getURLsFromAttributes(child)); On 2014/12/08 20:31:09, Wladimir Palant wrote: > I think for the <track> element you need to call getURLsFromMediaElement() > recursively. If I didn't misread the standard, a <track> element can't have <track> or <source> child elements. Neither did I find any examples doing that. So why do you think so?
LGTM http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... File include.postload.js (right): http://codereview.adblockplus.org/5468762555809792/diff/5703128158568448/incl... include.postload.js:246: urls.push.apply(urls, getURLsFromAttributes(child)); On 2014/12/08 21:12:49, Sebastian Noack wrote: > If I didn't misread the standard, a <track> element can't have <track> or > <source> child elements. Neither did I find any examples doing that. So why do > you think so? It seems that I indeed misunderstood the standard, MDN clearly says that <track> is an empty element. |