Issue 29331669: Issue 3237 - Whitelisted AFS ad is not shown on netzwelt.de

Issue 29331669: Issue 3237 - Whitelisted AFS ad is not shown on netzwelt.de (Closed)

Created:
Dec. 1, 2015, 2:15 a.m. by Oleksandr

Modified:
Dec. 16, 2015, 1:10 p.m.

Reviewers:
sergei, Eric

CC:
Felix Dahlke

Visibility:
Public.

Description

Issue 3237 - Whitelisted AFS ad is not shown on netzwelt.de

Patch Set 2 : Move subdocument detection to the top. #

Total comments: 2

Created: Dec. 15, 2015, 2:05 a.m.

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+14 lines, -13 lines)			Patch
M	src/plugin/AdblockPlusDomTraverser.cpp	View	1	2 chunks	+2 lines, -2 lines	0 comments	Download
M	src/plugin/PluginDomTraverserBase.h	View	1	6 chunks	+7 lines, -7 lines	0 comments	Download
M	src/plugin/PluginWbPassThrough.cpp	View	1	2 chunks	+5 lines, -4 lines	2 comments	Download

Patchset 2 is a rebase and addressing comments. https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWbPassThrough.cpp File src/plugin/PluginWbPassThrough.cpp (right): https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWbPassThrough.cpp#newcode145 src/plugin/PluginWbPassThrough.cpp:145: ContentType ...

Dec. 15, 2015, 2:10 a.m. (2015-12-15 02:10:16 UTC) #4

Patchset 2 is a rebase and addressing comments.

https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWb...
File src/plugin/PluginWbPassThrough.cpp (right):

https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWb...
src/plugin/PluginWbPassThrough.cpp:145: ContentType
WBPassthruSink::GetContentTypeFromMimeType(const std::wstring& mimeType)
On 2015/12/01 20:32:56, Eric wrote:
> Here's the relevant section on the Accept header from the standard
>     http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
> It specifies an algorithm to determine precedence of the various media types.
> Unfortunately, that precedence is not the order they appear in the media type
> list. Nevertheless, it would seem that the highest precedence media types says
> more about the request type than the lower ones. The present algorithm doesn't
> distinguish precedence. On the other hand, I don't know what IE actually
> generates to know if this is important.

> That all said, the present algorithm looks throughout the whole media type
list,
> ignoring anything like precedence. If IE is generating these lists sanely,
then
> the first media type in the list would be the highest precedence one. If
that's
> the case, we'd want to loop through the media types and check each in turn. 

It is not known really if the order has any effect on the possible type of the
request. It looks like it might, but it is a little far fetched, I think.
There's also very little gain had we implemented the precedence algorithm
anyway.


> I can guess that IE doesn't send Accept-Charset or Accept-Language when it's
> requesting images or objects. (I haven't tried to find out if this is so.) We
> might be able to get better distinctions if we look at these headers in
addition
> to Accept by itself.

It does send Accept-Language for images as well.

https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWb...
src/plugin/PluginWbPassThrough.cpp:151: if ((mimeType.find(L"text/") !=
std::wstring::npos) ||
On 2015/12/01 20:32:57, Eric wrote:
> On 2015/12/01 10:53:59, sergei wrote:
> > when and what are there some different "text/???" and "application/???"?
> 
> We don't have anything like a crawler for IE, but it doesn't seem that it
would
> be all that difficult to log the media type list here, crawl manually, and
> gather a collection of the media type lists that IE actually generates. 

I have only been able to log 2 types here:
text/html, application/xhtml+xml, image/jxr, */*
image/png, image/svg+xml, image/jxr, image/*;q=0.8, */*;q=0.5

In IE9 the image/jxr part is missing, but other then that the header looks very
similar. So I agree, we can be precise here.

https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWb...
src/plugin/PluginWbPassThrough.cpp:170: if (mimeType.find(L"text/html") !=
std::wstring::npos)
On 2015/12/01 20:32:56, Eric wrote:
> Rather than testing for "text/html" when there's a known image, you could just
> move this test up to the top and test for it first.
> 
> I would also guess that anything including "application/xhtml+xml"
> (XML-serialized HTML5) is also a document. I don't know, though, if IE
generates
> that header in requests.

Seems like it does, based on my tests. Done.

Eric

LGTM. I wouldn't mind a new patch set with the suggestions in my comment. But ...

Dec. 15, 2015, 3:18 a.m. (2015-12-15 03:18:18 UTC) #5

LGTM.

I wouldn't mind a new patch set with the suggestions in my comment. But the
present one does the job, even though it does pass up some easy opportunities to
clean up the code.

https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWb...
File src/plugin/PluginWbPassThrough.cpp (right):

https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWb...
src/plugin/PluginWbPassThrough.cpp:145: ContentType
WBPassthruSink::GetContentTypeFromMimeType(const std::wstring& mimeType)
On 2015/12/15 02:10:16, Oleksandr wrote:
> There's also very little gain had we implemented the precedence algorithm
> anyway.

Given the data you've gathered, it seems that IE isn't doing anything
sophisticated here. It seems like it might just have some string constants it
uses.

https://codereview.adblockplus.org/29331669/diff/29331670/src/plugin/PluginWb...
src/plugin/PluginWbPassThrough.cpp:151: if ((mimeType.find(L"text/") !=
std::wstring::npos) ||
On 2015/12/15 02:10:16, Oleksandr wrote:
> I have only been able to log 2 types here:
> text/html, application/xhtml+xml, image/jxr, */*
> image/png, image/svg+xml, image/jxr, image/*;q=0.8, */*;q=0.5
> 
> In IE9 the image/jxr part is missing, but other then that the header looks
very
> similar. So I agree, we can be precise here.

These are the strings I'd like to see in a comment.

https://codereview.adblockplus.org/29331669/diff/29332669/src/plugin/PluginWb...
File src/plugin/PluginWbPassThrough.cpp (right):

https://codereview.adblockplus.org/29331669/diff/29332669/src/plugin/PluginWb...
src/plugin/PluginWbPassThrough.cpp:145: ContentType
WBPassthruSink::GetContentTypeFromMimeType(const std::wstring& mimeType)
While not strictly necessary, there are a few things that would improve the
code, as long as we're touching it here. They're listed in the order I care.
- Add a comment here listing the media type strings found in testing. That's
some useful information that doesn't deserve to get buried in a comment in a
review.
- "MimeType" should be renamed "MediaTypeList". The present name is somewhat
confusing unless you track down where it's called from.
- I'd use "InferContentType" rather than "Get...". This is a heuristic, not just
a parse-and-fetch operation like its name seems to indicate.
- This function could be in an anonymous namespace. It doesn't need to be a
static class member.

I'm not going to hold the review up if these aren't present, but I would like to
see them, particularly the first, since it directly related to the defect being
fixed in the present change set.

sergei

Dec. 15, 2015, 3:29 p.m. (2015-12-15 15:29:32 UTC) #6

LGTM

https://codereview.adblockplus.org/29331669/diff/29332669/src/plugin/PluginWb...
File src/plugin/PluginWbPassThrough.cpp (right):

https://codereview.adblockplus.org/29331669/diff/29332669/src/plugin/PluginWb...
src/plugin/PluginWbPassThrough.cpp:145: ContentType
WBPassthruSink::GetContentTypeFromMimeType(const std::wstring& mimeType)
On 2015/12/15 03:18:18, Eric wrote:
> While not strictly necessary, there are a few things that would improve the
> code, as long as we're touching it here. They're listed in the order I care.
> - Add a comment here listing the media type strings found in testing. That's
> some useful information that doesn't deserve to get buried in a comment in a
> review.
> - "MimeType" should be renamed "MediaTypeList". The present name is somewhat
> confusing unless you track down where it's called from.
> - I'd use "InferContentType" rather than "Get...". This is a heuristic, not
just
> a parse-and-fetch operation like its name seems to indicate.
> - This function could be in an anonymous namespace. It doesn't need to be a
> static class member.
> 
> I'm not going to hold the review up if these aren't present, but I would like
to
> see them, particularly the first, since it directly related to the defect
being
> fixed in the present change set.

The same from my side.

Expand All Messages | Collapse All Messages