compiled/Filter.cpp - Issue 29333474: Issue 4125 - [emscripten] Convert filter classes to C++

Side by Side Diff: compiled/Filter.cpp

Issue 29333474: Issue 4125 - [emscripten] Convert filter classes to C++ (Closed)

Patch Set: Got rid of extra output in bindings.js file Created June 9, 2016, 1:35 p.m.

Left:
Right:

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 #include "Filter.h"

	2 #include "CommentFilter.h"

	3 #include "InvalidFilter.h"

	4 #include "RegExpFilter.h"

	5 #include "BlockingFilter.h"

	6 #include "WhitelistFilter.h"

	7 #include "ElemHideBase.h"

	8 #include "ElemHideFilter.h"

	9 #include "ElemHideException.h"

	10 #include "CSSPropertyFilter.h"

	11 #include "StringMap.h"

	12

	13 namespace

	14 {

	15 StringMap<Filter*> knownFilters(8192);
	sergei 2016/06/16 21:16:28 What about having another object which holds known What about having another object which holds knownFilters and has MakeFilterFromText which is currently static Filter::FromText? sergei 2016/06/16 21:16:31 Does it add a big overhead to store intrusive_ptr Does it add a big overhead to store intrusive_ptr instead of a raw pointer there? Wladimir Palant 2016/12/06 10:47:22 Sure, we can probably do that - but at the moment Show quoted text On 2016/06/16 21:16:28, sergei wrote: > What about having another object which holds knownFilters and has > MakeFilterFromText which is currently static Filter::FromText? Sure, we can probably do that - but at the moment I'd rather replicate the existing APIs as much as possible, we'll need significant adjustments to JavaScript code as it is already. On 2016/06/16 21:16:31, sergei wrote: Show quoted text > Does it add a big overhead to store intrusive_ptr instead of a raw pointer > there? Overhead is not the point - if we store intrusive_ptr here we will leak the filter. The idea is that we only store the filter here for as long as it is being used. Once no references to that filter are remaining (meaning no references from FilterStorage in particular) and the filter is released it should be removed from knownFilters (see TODO comment in the destructor below).
	16

	17 void NormalizeWhitespace(DependentString& text)

	18 {

	19 String::size_type start = 0;

	20 String::size_type end = text.length();

	21

	22 // Remove leading spaces and special characters like line breaks

	23 for (; start < end; start++)

	24 if (text[start] > ' ')

	25 break;

	26

	27 // Now look for invalid characters inside the string

	28 String::size_type pos;

	29 for (pos = start; pos < end; pos++)

	30 if (text[pos] < ' ')

	31 break;

	32

	33 if (pos < end)

	34 {

	35 // Found invalid characters, copy all the valid characters while skipping

	36 // the invalid ones.

	37 String::size_type delta = 1;

	38 for (pos = pos + 1; pos < end; pos++)

	39 {

	40 if (text[pos] < ' ')

	41 delta++;

	42 else

	43 text[pos - delta] = text[pos];

	44 }

	45 end -= delta;

	46 }

	47

	48 // Remove trailing spaces

	49 for (; end > 0; end--)

	50 if (text[end - 1] != ' ')

	51 break;

	52

	53 // Set new string boundaries

	54 text.reset(text, start, end - start);

	55 }

	56 }

	57

	58 Filter::Filter(const String& text)

	59 : mText(text)

	60 {

	61 annotate_address(this, "Filter");

	62 }

	63

	64 Filter::~Filter()

	65 {

	66 // TODO: This should be removing from knownFilters
	Wladimir Palant 2016/12/06 10:47:24 I addressed this TODO comment so that we can stop I addressed this TODO comment so that we can stop leaking filters below.
	67 }

	68

	69 OwnedString Filter::Serialize() const

	70 {

	71 OwnedString result(u"[Filter]\ntext="_str);

	72 result.append(mText);

	73 result.append(u'\n');

	74 return result;

	75 }

	76

	77 Filter* Filter::FromText(DependentString& text)

	78 {

	79 NormalizeWhitespace(text);

	80 if (text.empty())

	81 return nullptr;

	82

	83 // Parsing also normalizes the filter text, so it has to be done before the

	84 // lookup in knownFilters.

	85 union

	86 {

	87 RegExpFilterData regexp;

	88 ElemHideData elemhide;

	89 } data;

	90 DependentString error;

	91

	92 Filter::Type type = CommentFilter::Parse(text);

	93 if (type == Filter::Type::UNKNOWN)

	94 type = ElemHideBase::Parse(text, data.elemhide);

	95 if (type == Filter::Type::UNKNOWN)

	96 type = RegExpFilter::Parse(text, error, data.regexp);

	97

	98 auto knownFilter = knownFilters.find(text);

	99 if (knownFilter)

	100 return knownFilter->second;

	101

	102 FilterPtr filter;

	103 switch (type)

	104 {

	105 case Filter::Type::COMMENT:

	106 filter = new CommentFilter(text);

	107 break;

	108 case Filter::Type::INVALID:

	109 filter = new InvalidFilter(text, error);

	110 break;

	111 case Filter::Type::BLOCKING:

	112 filter = new BlockingFilter(text, data.regexp);

	113 break;

	114 case Filter::Type::WHITELIST:

	115 filter = new WhitelistFilter(text, data.regexp);

	116 break;

	117 case Filter::Type::ELEMHIDE:

	118 filter = new ElemHideFilter(text, data.elemhide);

	119 break;

	120 case Filter::Type::ELEMHIDEEXCEPTION:

	121 filter = new ElemHideException(text, data.elemhide);

	122 break;

	123 case Filter::Type::CSSPROPERTY:

	124 filter = new CSSPropertyFilter(text, data.elemhide);

	125 if (static_cast<CSSPropertyFilter*>(filter.get())->IsGeneric())

	126 filter = new InvalidFilter(text, u"filter_cssproperty_nodomain"_str);

	127 break;

	128 default:

	129 // This should never happen but just in case

	130 return nullptr;

	131 }

	132

	133 // This is a hack: we looked up the entry using text but create it using

	134 // filter->mText. This works because both are equal at this point. However,

	135 // text refers to a temporary buffer which will go away.

	136 enter_context("Adding to known filters");

	137 knownFilter.assign(filter->mText, filter.get());

	138 exit_context();

	139

	140 // TODO: We intentionally leak the filter here - currently it won't be used

	141 // for anything and would be deleted immediately.
	sergei 2016/06/16 21:16:33 Actually, we should have a convention that when we Actually, we should have a convention that when we return a raw pointer the caller is responsible for freeing it, we should call AddRef above for "return knownFilter->second;" and GetKnownFilter should also call AddRef. In addition we could have intrusive_ptr::detach() which returns raw pointer and sets internal pointer member to nullptr and intrusive_ptr::attach which takes the ownership without calling of AddRef (of course attach calls ReleaseRef for the previously held value if it's valid). BTW, here is one of the case when intrusive_ptr::operator T() is dangerous. Wladimir Palant* 2016/12/06 10:47:20 Ok, let's implement proper semantics. Show quoted text On 2016/06/16 21:16:33, sergei wrote: > Actually, we should have a convention that when we return a raw pointer the > caller is responsible for freeing it, we should call AddRef above for "return > knownFilter->second;" and GetKnownFilter should also call AddRef. Ok, let's implement proper semantics. Show quoted text > In addition we could have intrusive_ptr::detach() which returns raw pointer and > sets internal pointer member to nullptr and intrusive_ptr::attach which takes > the ownership without calling of AddRef (of course attach calls ReleaseRef for > the previously held value if it's valid). Actually, if our convention is that raw pointers are already addref'ed then we just should never addref them when taking ownership of them. I called the other method intrusive_ptr::release() for consistency with unique_ptr. Show quoted text > BTW, here is one of the case when intrusive_ptr::operator T*() is dangerous. Yes, we have intrusive_ptr::release() now so let's get rid of it.
	142 filter->AddRef();

	143

	144 return filter;

	145 }

	146

	147 Filter* Filter::GetKnownFilter(const String& text)
	sergei 2016/06/16 21:16:30 It seems this method is not used, do we really nee It seems this method is not used, do we really need it? Wladimir Palant 2016/12/06 10:47:18 It isn't used in this form of course - the origina Show quoted text On 2016/06/16 21:16:30, sergei wrote: > It seems this method is not used, do we really need it? It isn't used in this form of course - the original code was exposing Filter.knownFilters so you could do `text in Filter.knownFilters` instead. I see however that no code appears to be doing that, everything is rather calling Filter.fromText() blindly. So let's remove this method and see how far we get with it.
	148 {

	149 auto it = knownFilters.find(text);

	150 if (it)

	151 return it->second;

	152 else

	153 return nullptr;

	154 }

OLD	NEW

« compiled/Filter.h ('K') | « compiled/Filter.h ('k') | compiled/InvalidFilter.h » ('j') | compiled/RegExpFilter.h » ('J')