|
|
DescriptionIssue #1234 - Remove 'CString' from PluginFilter.*
Change all string declarations in PluginFilter.h to 'std::wstring'. Most were
'CString'; one was 'CComBSTR'. Change all local string variables in
PluginFilter.cpp to 'CString'.
Remove declaration cruft from CFilterElementHideAttrSelector. Change
declaration to 'struct', since all members were public. Inline the
constructor initializations. Eliminate explicit copy constructor and
destructor, since they were no different from the defaults.
Replace 'CString::TrimRight' and 'CString::TrimLeft' with a reworked set of
'TrimString*' functions. Expand existing unit tests to include test points for
new 'TrimString*' functions. Added unit tests to verify argument-passing
behavior.
Replace 'CString::Tokenize' with loops built around 'wcstok_s()'.
Tighten up the parsing in the constructor for 'CFilterElementHide'. Add a
utility exception class, alleviating duplication of exceptions and logging.
Patch Set 1 : #
Total comments: 58
Patch Set 2 : address comments #
Total comments: 4
Patch Set 3 : rebase #Patch Set 4 : rebase #Patch Set 5 : rebase #Patch Set 6 : rebase #
Total comments: 33
Patch Set 7 : remove TokenSequence #
Total comments: 3
Patch Set 8 : fix whitespace nit #
MessagesTotal messages: 20
A tremendous amount of work is done here! I have not taken a look at TokenSequence.h because as I understand we use `wcstok_s` instead of it. https://codereview.adblockplus.org/29331055/diff/29331528/adblockplus.gyp File adblockplus.gyp (right): https://codereview.adblockplus.org/29331055/diff/29331528/adblockplus.gyp#new... adblockplus.gyp:223: 'libraries': ['-ladvapi32', '-lshell32', '-lole32', '-loleaut32'], Why do we need it? https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:36: std::wstring ToLowerString(const std::wstring& s) It should be in anonymous namespace. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:44: void ElementHideParseError(const std::wstring& filterText, const std::string& reason) It should be in anonymous namespace. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:44: void ElementHideParseError(const std::wstring& filterText, const std::string& reason) It's really not clear from the name of the method that it throws an exception. The usage of it even looks like creating of a temporary instance of a class `ElementHideParseError` e.g. ElementHideParseError(filterText, "invalid tag"); Something like Throw_ParseElementHideException would be better. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:44: void ElementHideParseError(const std::wstring& filterText, const std::string& reason) It seems to know the current position would be also very useful. Could you include `filterSuffix` here as well? https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:79: m_tag = ToLowerString(filterSuffix.substr(0,pos)); space after comma is missed https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:171: ElementHideParseError(filterText, "empty attribute name before " + std::string(1, static_cast<char>(arg[0])) + "'='"); std::string(1, static_cast<char>(arg[0])) it seems better to call `ToUtf8String(arg[0])` https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:255: if (FAILED( pEl->get_id(&idBstr)) || !idBstr || m_tagId != ToWstring(idBstr)) additional space after '(' https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:281: TokenSequence<std::wstring> ts(classNameList, L" "); Did we decide to use `wcstok_s` instead of custom tokenization? https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:313: for (auto attrIt = m_attributeSelectors.begin(); attrIt != m_attributeSelectors.end(); ++ attrIt) additional space after `++` https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:513: id = std::wstring(ToWstring(idBstr)); It should be just `id = ToWstring(idBstr);` https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:528: for (auto idIt = idItEnum.first; idIt != idItEnum.second; idIt ++) Good chance to change `idIt ++` to `++idIt`. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:542: for (auto idIt = idItEnum.first; idIt != idItEnum.second; idIt ++) Good chance to change `idIt ++` to `++idIt` https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:558: TokenSequence<std::wstring> ts(classNames, L" \t\n\r"); Same here, let's use `wcstok_s` instead of custom tokenization. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:593: for (auto tagIt = tagItEnum.first; tagIt != tagItEnum.second; ++ tagIt) additional space after `++` https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.h (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.h:77: CFilterElementHide(const std::wstring& filterText=L""); spaces around '=' are missed https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.h:128: typedef std::multimap<std::pair<std::wstring,std::wstring>, CFilterElementHide> TFilterElementHideTagsNamed; space after comma is missed https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp File test/UtilTest.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:22: void TrimTestBody(std::wstring input, std::wstring expected) It's not introduces in this change set but the comments are the same, const references, EXPECT_EQ instead of ASSERT_EQ and let's rename it. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:28: void TrimLeftTestBody(std::wstring input, std::wstring expected) Why not to pass arguments as constant references? https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:28: void TrimLeftTestBody(std::wstring input, std::wstring expected) It would be better to call it like ExpectEqualToLeftTrimmed(expected, input); https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:31: ASSERT_EQ(expected, trimmed); It should not stop the tests, could you please use EXPECT_EQ? https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:34: void TrimRightTestBody(std::wstring input, std::wstring expected) Same here, constant references https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:34: void TrimRightTestBody(std::wstring input, std::wstring expected) ExpectEqualToRightTrimmed(expected, input); https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:37: ASSERT_EQ(expected, trimmed); EXPECT_EQ https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:165: ASSERT_EQ(y, x); EXPECT_EQ here and below https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:166: std::wstring trimmed = TrimString(x); // expect here pass by value What if one calls it as `TrimString<std::wstring&>(x)`? It's actually a bug in the test, we should explicitly specify the argument type as a non-constant reference. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:171: TEST(TrimTest, TrimBindsOnConstArg) If it's an attempt to test whether there is some bug which modifies the string passed into TrimString then I would say we likely don't need such test. Even more, here is a bug in the test because the parameter of TrimString will a copy of an argument and `TrimString<const std::wstring&>(x)` is not compilable.
Patch set 2 addresses sergei's comments. On 2015/11/30 12:37:55, sergei wrote: > I have not taken a look at TokenSequence.h because as I understand we use > `wcstok_s` instead of it. Please go look at it and the accompanying unit tests. I did not change to 'wcstok_s' in the new patch set, nor am I inclined to. https://codereview.adblockplus.org/29331055/diff/29331528/adblockplus.gyp File adblockplus.gyp (right): https://codereview.adblockplus.org/29331055/diff/29331528/adblockplus.gyp#new... adblockplus.gyp:223: 'libraries': ['-ladvapi32', '-lshell32', '-lole32', '-loleaut32'], On 2015/11/30 12:37:51, sergei wrote: > Why do we need it? SysStringLen, used in ToWstring(BSTR) Apparently this library is getting linked in the plugin from some other library dependency, so we don't need to specify it explicitly there. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:36: std::wstring ToLowerString(const std::wstring& s) On 2015/11/30 12:37:53, sergei wrote: > It should be in anonymous namespace. Done. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:44: void ElementHideParseError(const std::wstring& filterText, const std::string& reason) On 2015/11/30 12:37:52, sergei wrote: > It seems to know the current position would be also very useful. Could you > include `filterSuffix` here as well? I thought about doing that. I decided not to since syntax errors in filter lists are fairly uncommon. My judgement call was that we could do that later. I'm not averse to generating a proper syntax error message, but doing so properly should use a utility class to do so and have associated unit tests. We can do that later. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:44: void ElementHideParseError(const std::wstring& filterText, const std::string& reason) On 2015/11/30 12:37:53, sergei wrote: > It should be in anonymous namespace. Done. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:44: void ElementHideParseError(const std::wstring& filterText, const std::string& reason) On 2015/11/30 12:37:52, sergei wrote: > It's really not clear from the name of the method that it throws an exception. I changed the function to a subclass of std::runtime_error, so that the throw statement is inline. It's only a few extra lines of extra code, but make the exception behavior obvious. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:79: m_tag = ToLowerString(filterSuffix.substr(0,pos)); On 2015/11/30 12:37:52, sergei wrote: > space after comma is missed Done. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:171: ElementHideParseError(filterText, "empty attribute name before " + std::string(1, static_cast<char>(arg[0])) + "'='"); On 2015/11/30 12:37:52, sergei wrote: > std::string(1, static_cast<char>(arg[0])) > it seems better to call `ToUtf8String(arg[0])` ToUtf8String only takes wstring arguments. The only basic_string constructor for characters is the fill constructor that requires a length. The statement would have to be thus. ToUtf8String(std::wstring(1,arg[0])) I find the code in the patch set more readable, because the std::string constructor is on the outside and thus the type is obvious. I should point out that there are only exactly three characters that arg[0] can be, one of '^', '*', or '$', so there's no question of locale and ToUtf8String is overkill. If you'd like a code comment to that effect, I can add one. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:281: TokenSequence<std::wstring> ts(classNameList, L" "); On 2015/11/30 12:37:52, sergei wrote: > Did we decide to use `wcstok_s` instead of custom tokenization? We're using that function at present in exactly one loop in the plugin. We used it in that case, as I recall, because the proposed code wasn't any better at the time. The class 'TokenSequence', though, is definitely better that using 'wcstok_s'. -- TokenSequence uses C++ strings natively. 'wcstok_s' uses C-style strings. Using TokenSequence requires no conversion back and forth to C-style strings. -- TokenSequence uses iterators in the native C++ pattern. -- TokenSequence only requires a single specification of delimiters. 'wcstok_s' always requires two calls with the same delimiter string (one to initialize the loop, one to increment it), violating DRY. -- TokenSequence loops are shorter than the alternative. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:313: for (auto attrIt = m_attributeSelectors.begin(); attrIt != m_attributeSelectors.end(); ++ attrIt) On 2015/11/30 12:37:52, sergei wrote: > additional space after `++` Done. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:513: id = std::wstring(ToWstring(idBstr)); On 2015/11/30 12:37:52, sergei wrote: > It should be just `id = ToWstring(idBstr);` Done. Good catch. That's leftover code from an earlier version before I put in ToWstring. I removed some others like this, but I missed this one. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:528: for (auto idIt = idItEnum.first; idIt != idItEnum.second; idIt ++) On 2015/11/30 12:37:53, sergei wrote: > Good chance to change `idIt ++` to `++idIt`. Done. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:542: for (auto idIt = idItEnum.first; idIt != idItEnum.second; idIt ++) On 2015/11/30 12:37:51, sergei wrote: > Good chance to change `idIt ++` to `++idIt` Done. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:558: TokenSequence<std::wstring> ts(classNames, L" \t\n\r"); On 2015/11/30 12:37:52, sergei wrote: > Same here, let's use `wcstok_s` instead of custom tokenization. See above comment. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:593: for (auto tagIt = tagItEnum.first; tagIt != tagItEnum.second; ++ tagIt) On 2015/11/30 12:37:53, sergei wrote: > additional space after `++` Done. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.h (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.h:77: CFilterElementHide(const std::wstring& filterText=L""); On 2015/11/30 12:37:53, sergei wrote: > spaces around '=' are missed Done. I wasn't really looking for existing style problems, but I don't mind rolling them into the present change set. Strictly speaking, they're unrelated, but there's no good occasion to fix them unless they tag along with changes to related code. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.h:128: typedef std::multimap<std::pair<std::wstring,std::wstring>, CFilterElementHide> TFilterElementHideTagsNamed; On 2015/11/30 12:37:54, sergei wrote: > space after comma is missed Done. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp File test/UtilTest.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:22: void TrimTestBody(std::wstring input, std::wstring expected) On 2015/11/30 12:37:54, sergei wrote: > It's not introduces in this change set but the comments are the same, const > references, EXPECT_EQ instead of ASSERT_EQ and let's rename it. Done. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:28: void TrimLeftTestBody(std::wstring input, std::wstring expected) On 2015/11/30 12:37:54, sergei wrote: > Why not to pass arguments as constant references? Done. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:28: void TrimLeftTestBody(std::wstring input, std::wstring expected) On 2015/11/30 12:37:54, sergei wrote: > It would be better to call it like > ExpectEqualToLeftTrimmed(expected, input); I'm not going to bother with the renaming. I don't see any good reason to to make it look like a Google Test primitive. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:31: ASSERT_EQ(expected, trimmed); On 2015/11/30 12:37:54, sergei wrote: > It should not stop the tests, could you please use EXPECT_EQ? Done. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:34: void TrimRightTestBody(std::wstring input, std::wstring expected) On 2015/11/30 12:37:54, sergei wrote: > Same here, constant references Done. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:37: ASSERT_EQ(expected, trimmed); On 2015/11/30 12:37:55, sergei wrote: > EXPECT_EQ Done. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:165: ASSERT_EQ(y, x); On 2015/11/30 12:37:54, sergei wrote: > EXPECT_EQ here and below This statement is largely documentation. It's here to match the ASSERT at the end. I did change the middle one. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:166: std::wstring trimmed = TrimString(x); // expect here pass by value On 2015/11/30 12:37:54, sergei wrote: > What if one calls it as `TrimString<std::wstring&>(x)`? If you do that, you get in-place modification. I searched the code; there's no occurrence of that syntax anywhere in our code. If this were a general purpose library, I'd worry about it right now. > It's actually a bug in the test, we should explicitly specify the argument type > as a non-constant reference. It's not a defect in the test. It's arguably a defect in 'TrimString', but if it is, it's not one that has any significant consequences. This test simply ensures that when the argument is a variable that it's being passed by value so that it doesn't break existing code. I changed the implementation significantly enough that I wanted verification on this point. Incidentally, most, but not all, of the uses of TrimString are in-place modifications. It would be good, in any case, to change them to use the explicit '...InPlace...' variants in the new code. I didn't do it in this change set because it's too much out of scope. If we want library-quality code, we'd want all of the following variant arguments - T - T& - const T& - T&& It would take longer to write the unit tests for all of those than to write all the variants. I don't see it being worth the time at present to do that. And if we want to do it, it's a separate change and a new ticket #. https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:171: TEST(TrimTest, TrimBindsOnConstArg) On 2015/11/30 12:37:55, sergei wrote: > If it's an attempt to test whether there is some bug which modifies the string > passed into TrimString then I would say we likely don't need such test. It verifies the modification, yes. The real purpose of the test is to ensure that the present function works with const arguments. That's why the test has the name it does. > Even more, here is a bug in the test because the parameter of TrimString will a > copy of an argument and `TrimString<const std::wstring&>(x)` is not compilable. There's no defect in this test. It verifies exactly what it's supposed to do, that T instantiates as 'std::wstring', that the const argument is used to initialize a temporary argument object, and that the return value is correct. The verification that the const variable 'x' hasn't changed is icing on the cake. `TrimString<const std::wstring&>(x)` doesn't need to be compilable. As I said above, 'TrimString' isn't meant a completely general purpose template function.
https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:171: ElementHideParseError(filterText, "empty attribute name before " + std::string(1, static_cast<char>(arg[0])) + "'='"); On 2015/11/30 15:54:06, Eric wrote: > On 2015/11/30 12:37:52, sergei wrote: > > std::string(1, static_cast<char>(arg[0])) > > it seems better to call `ToUtf8String(arg[0])` > > ToUtf8String only takes wstring arguments. The only basic_string constructor for > characters is the fill constructor that requires a length. The statement would > have to be thus. > > ToUtf8String(std::wstring(1,arg[0])) > > I find the code in the patch set more readable, because the std::string > constructor is on the outside and thus the type is obvious. > > I should point out that there are only exactly three characters that arg[0] can > be, one of '^', '*', or '$', so there's no question of locale and ToUtf8String > is overkill. If you'd like a code comment to that effect, I can add one. Acknowledged. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:281: TokenSequence<std::wstring> ts(classNameList, L" "); On 2015/11/30 15:54:05, Eric wrote: > On 2015/11/30 12:37:52, sergei wrote: > > Did we decide to use `wcstok_s` instead of custom tokenization? > > We're using that function at present in exactly one loop in the plugin. We used > it in that case, as I recall, because the proposed code wasn't any better at the > time. > > The class 'TokenSequence', though, is definitely better that using 'wcstok_s'. > -- TokenSequence uses C++ strings natively. 'wcstok_s' uses C-style strings. > Using TokenSequence requires no conversion back and forth to C-style strings. > -- TokenSequence uses iterators in the native C++ pattern. > -- TokenSequence only requires a single specification of delimiters. 'wcstok_s' > always requires two calls with the same delimiter string (one to initialize the > loop, one to increment it), violating DRY. > -- TokenSequence loops are shorter than the alternative. Of course it's better than 'wcstok_s', it's difficult to make something equally or more bad. @Oleksandr, previous time you was the originator of using 'wcstok_s', what is your opinion? https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp File test/UtilTest.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/test/UtilTest.cpp#n... test/UtilTest.cpp:166: std::wstring trimmed = TrimString(x); // expect here pass by value On 2015/11/30 15:54:08, Eric wrote: > On 2015/11/30 12:37:54, sergei wrote: > > What if one calls it as `TrimString<std::wstring&>(x)`? > > If you do that, you get in-place modification. I searched the code; there's no > occurrence of that syntax anywhere in our code. If this were a general purpose > library, I'd worry about it right now. > > > It's actually a bug in the test, we should explicitly specify the argument > type > > as a non-constant reference. > > It's not a defect in the test. It's arguably a defect in 'TrimString', but if it > is, it's not one that has any significant consequences. > > This test simply ensures that when the argument is a variable that it's being > passed by value so that it doesn't break existing code. I changed the > implementation significantly enough that I wanted verification on this point. > > Incidentally, most, but not all, of the uses of TrimString are in-place > modifications. It would be good, in any case, to change them to use the explicit > '...InPlace...' variants in the new code. I didn't do it in this change set > because it's too much out of scope. > > If we want library-quality code, we'd want all of the following variant > arguments > - T > - T& > - const T& > - T&& > It would take longer to write the unit tests for all of those than to write all > the variants. I don't see it being worth the time at present to do that. And if > we want to do it, it's a separate change and a new ticket #. Acknowledged. It's another issue, let's leave it.
https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:281: TokenSequence<std::wstring> ts(classNameList, L" "); On 2015/11/30 16:36:03, sergei wrote: > Of course it's better than 'wcstok_s', it's difficult to make something equally > or more bad. FWIW, the standard library function it replaced, the infamous 'strtok()', was in fact worse. That thing wasn't even thread-safe.
This was a gargantuan review indeed, thanks :) I would simplify it a bit though. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:281: TokenSequence<std::wstring> ts(classNameList, L" "); On 2015/11/30 16:36:03, sergei wrote: > On 2015/11/30 15:54:05, Eric wrote: > > On 2015/11/30 12:37:52, sergei wrote: > > > Did we decide to use `wcstok_s` instead of custom tokenization? > > > > We're using that function at present in exactly one loop in the plugin. We > used > > it in that case, as I recall, because the proposed code wasn't any better at > the > > time. > > > > The class 'TokenSequence', though, is definitely better that using 'wcstok_s'. > > -- TokenSequence uses C++ strings natively. 'wcstok_s' uses C-style strings. > > Using TokenSequence requires no conversion back and forth to C-style strings. > > -- TokenSequence uses iterators in the native C++ pattern. > > -- TokenSequence only requires a single specification of delimiters. > 'wcstok_s' > > always requires two calls with the same delimiter string (one to initialize > the > > loop, one to increment it), violating DRY. > > -- TokenSequence loops are shorter than the alternative. > Of course it's better than 'wcstok_s', it's difficult to make something equally > or more bad. > @Oleksandr, previous time you was the originator of using 'wcstok_s', what is > your opinion? TokenSequence looks beautiful but unnecessary, IMHO. I don't see why we need it. Even if just using wcstok_s is questionable (because of native-std string conversions), what's wrong with something like: std::wstringstream classNameListStream(classNameList); for (std::wstring iterClassName; std::getline(classNameListStream, iterClassName, L' ');) { ... } https://codereview.adblockplus.org/29331055/diff/29331624/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331624/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:40: return ToWstring(ToCString(s).MakeLower()); Converting to CString and back seems redundant. How about _wcslwr_s instead? https://codereview.adblockplus.org/29331055/diff/29331624/src/shared/Utils.h File src/shared/Utils.h (right): https://codereview.adblockplus.org/29331055/diff/29331624/src/shared/Utils.h#... src/shared/Utils.h:73: * Th method used below for trimming strings is taken from here: Nit: "The" method?
New patch set. Rebase only affect the GYP file. https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331528/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:281: TokenSequence<std::wstring> ts(classNameList, L" "); On 2015/12/04 01:03:15, Oleksandr wrote: > TokenSequence looks beautiful but unnecessary, IMHO. I don't see why we need it. No one ever needs beautiful code. It does, however, reduce defect rates, improve maintainability, made code easier to understand and audit. If you don't want these benefits, please use ugly code. I prefer not to write such code. > Even if just using wcstok_s is questionable (because of native-std string > conversions), what's wrong with something like: > std::wstringstream classNameListStream(classNameList); > for (std::wstring iterClassName; std::getline(classNameListStream, > iterClassName, L' ');) > { > ... > } The performance overhead of stream buffering is rather large, particularly for something as simple as delimiter-based parsing. This function is called once per per element (roughly) in the traverser, so performance is at least somewhat a concern. https://codereview.adblockplus.org/29331055/diff/29331624/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29331624/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:40: return ToWstring(ToCString(s).MakeLower()); On 2015/12/04 01:03:16, Oleksandr wrote: > Converting to CString and back seems redundant. How about _wcslwr_s instead? I put this in the commit message as out-of-scope. The particular issue is this. HTML attribute names can contain Unicode characters. If it were just ASCII, it would be an easy issue. I don't want to delve into all the associated locale issues here. Rather than commingle this change into an already-large review, I decided to just use the same function as the previous change, avoiding a regression on this point. A future change set can change it. If such a change causes problems, it will be isolated and easier to deal with. https://codereview.adblockplus.org/29331055/diff/29331624/src/shared/Utils.h File src/shared/Utils.h (right): https://codereview.adblockplus.org/29331055/diff/29331624/src/shared/Utils.h#... src/shared/Utils.h:73: * Th method used below for trimming strings is taken from here: On 2015/12/04 01:03:16, Oleksandr wrote: > Nit: "The" method? Done.
New patch set is a rebase. Its only difference is to remove the declaration and definition of ToWstring(BSTR), already broken out and committed separately. For some reason the review tool is not showing this difference between patch sets.
Patch set 5 is the same as previous except for removing the definition of 'ToLowerString()', which has now been committed by another change set.
It's been two weeks since any response on this review. The only sticking point seems to be some code that apparently is too high quality to commit. The token sequence class with its iterator are already written, tested, and ready to go. If there is an actual problem with the token sequence, I haven't seen it. The criticism that "we don't need it" is a weak one in my opinion. The proposed alternative uses C-style strings, it's far wordier at the point of use, and makes for a harder-to-read loop. We are writing in C++, not C, and we need not be burdened by the lack of abstraction in C. Also see this: Fear of Adding Classes http://c2.com/cgi/wiki?FearOfAddingClasses
Patch set 6 is a rebase only. (The fourth one in a row, for the record.) This is one of only two remaining reviews outstanding for #1234. The only remaining issue, as before, is whether we want to use tested C++ code in the present review or to revert back to an inferior C solution using a proprietary Microsoft library function.
JIC, review of TokenSequence https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... File src/plugin/TokenSequence.h (right): https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:26: typedef TokenSequenceConstIterator<T> ConstIterator; I would like to have it as a public member and name it as `const_iterator`. It might does not fully support iterator API, but it's still good to have it STL-like. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:76: const Container& sequence; I'm pretty sure it generates a warning something like "The compiler cannot generate copy ctr and assignment operator." Could you explicitly make this class "uncopyable"? https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:77: size_t posFirstCharInToken; Wouldn't it better to name it as `tokenFirstCharPos[ition]` because it's a position of the first character of the token in the container. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:78: size_t posFirstCharNotInToken; And this one is a position of the next delimiter, so something like `nextDelimiterPos`. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:101: : sequence(sequence), posFirstCharInToken(T::npos) I feel myself uncomfortable when not all members without default constructor are initialized. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:108: void Advance(size_t posStart) Wouldn't it be better to rename it to something like `FindNextToken` because `Advance` refers rather to a sequence of calls of `iterator::operator++` in context of iterators. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:110: posFirstCharInToken = sequence.value.find_first_not_of(sequence.delimiters, posStart); That line "eats" all delimiters and skips empty tokens, that's very important moment. It should be reflected at least in the comment for `TokenSequence`, it also could be good to have it commented here. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:140: void operator++() It seems we are not using it, but it can be very convenient to return `*this` from the prefix increment operator. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:144: return; // Incrementing "end()" yields "end()" Should we throw an exception here instead? https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:149: posFirstCharInToken = T::npos; I'm not sure that we actually need to make an assignment here, actually `posFirstCharNotInToken` cannot be bigger than `sequence.value.length()`, but I don't object against `>=` in the clause above. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:150: return; Should we rather throw an exception? https://codereview.adblockplus.org/29331055/diff/29333226/test/plugin/TokenSe... File test/plugin/TokenSequenceTest.cpp (right): https://codereview.adblockplus.org/29331055/diff/29333226/test/plugin/TokenSe... test/plugin/TokenSequenceTest.cpp:30: TEST(TokenSequence, Cbegin1) That's really not clear what is being tested by any "CbeginX" or "CendX". https://codereview.adblockplus.org/29331055/diff/29333226/test/plugin/TokenSe... test/plugin/TokenSequenceTest.cpp:194: for (int j = 0; a != s.cend(); ++a, ++j) 1. What is tested here? 2. Why is it so complicated? What is the difference between this test and TEST(TokenSequence, MultipleTokens1)?
I've addressed all of Sergei's comments. No changes to behavior were necessary. There might be room for additional comments, but I'll wait for another round from Sergei to do so. I am still waiting for Oleksandr to reply to my comments on 2015-12-06 defending the presence of this class. He can weigh in also on what further comments would be useful. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... File src/plugin/TokenSequence.h (right): https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:26: typedef TokenSequenceConstIterator<T> ConstIterator; When our style guide at https://adblockplus.org/en/coding-style has a specific exemption for this, I'll gladly change it. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:76: const Container& sequence; On 2016/01/25 14:13:54, sergei wrote: > I'm pretty sure it generates a warning something like "The compiler cannot > generate copy ctr and assignment operator." I am completely sure it does not generate a warning, because I've compiled the code. > Could you explicitly make this class "uncopyable"? I would worry about this more if I were writing a proper library class. As it is, I've now got multiple changes that depend on this code and I am resistant to making changes that don't involve defects or other major problems. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:77: size_t posFirstCharInToken; On 2016/01/25 14:13:54, sergei wrote: > Wouldn't it better to name it as `tokenFirstCharPos[ition]` because it's a > position of the first character of the token in the container. It's "position of the first character in the token". It's generally better to put what kind of thing it is first in the name if you're dealing with an inadequate type system. (Positions in strings should be their own type, IMO, but they're not.) https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:78: size_t posFirstCharNotInToken; On 2016/01/25 14:13:53, sergei wrote: > And this one is a position of the next delimiter, so something like > `nextDelimiterPos`. Your suggested name is not as precise as the current one. These two variables are the endpoints of a half-closed, half-open interval of positions that define a token, and their names reflect exactly that. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:101: : sequence(sequence), posFirstCharInToken(T::npos) On 2016/01/25 14:13:54, sergei wrote: > I feel myself uncomfortable when not all members without default constructor are > initialized. If you would like a comment about why 'posFirstCharNotInToken' is not initialized, I can add one. It's a deliberate choice. The current code initializes to a valid iterator state; that's all that matters. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:108: void Advance(size_t posStart) On 2016/01/25 14:13:54, sergei wrote: > Wouldn't it be better to rename it to something like `FindNextToken` because > `Advance` refers rather to a sequence of calls of `iterator::operator++` in > context of iterators. It's better to name functions "what they do" than "how they work". In this case, "Advance" is named relative to the constructor and 'operator++', meaning "advance to the next token". https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:110: posFirstCharInToken = sequence.value.find_first_not_of(sequence.delimiters, posStart); On 2016/01/25 14:13:53, sergei wrote: > That line "eats" all delimiters and skips empty tokens, that's very important > moment. It should be reflected at least in the comment for `TokenSequence`, it > also could be good to have it commented here. From my point of view, it's completely conventional to collapse multiple whitespace delimiter characters into a single delimiter. Please don't use this class with "comma" as a delimiter. If this were a general purpose class, I'd worry about this more. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:140: void operator++() On 2016/01/25 14:13:53, sergei wrote: > It seems we are not using it, but it can be very convenient to return `*this` > from the prefix increment operator. Right. We're not using it. So no need for a return value. Again: this is not a general purpose library function. If it ever does, that's the time to add such things. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:144: return; // Incrementing "end()" yields "end()" On 2016/01/25 14:13:52, sergei wrote: > Should we throw an exception here instead? In fact, the way that this code is used in 'for' loops, this statement is never actually reached in any code we'll use. The presence of this test is purely defensive. And to answer the question directly, no, because that would not be the documented behavior, which is that incrementing "end()" yields "end()". Thankfully, this is not a library function, so we don't need to debate what interface would be better for general purpose use. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:149: posFirstCharInToken = T::npos; On 2016/01/25 14:13:53, sergei wrote: > I'm not sure that we actually need to make an assignment here, actually > `posFirstCharNotInToken` cannot be bigger than `sequence.value.length()`, but I > don't object against `>=` in the clause above. The very simple answer is that we're incrementing from a non-"end()" iterator to the "end()" iterator, and there's a single canonical value for "end()". That's valuable because it makes 'operator==' much less complicated. If we didn't have this statement, loops on the iterator would never terminate. The class invariant is that if a token is present, both 'posFirstCharNotInToken' and 'posFirstCharInToken' be string positions. See line 161 below for why. The design decision here is that these position tests go in the state-change functions. Access and predicate functions therefore do not perform such tests. Lines 117-120 above are a related test. Those fix up the 'npos' return value into a proper length. Using ">=" is slightly clearer to me than using "==". It's never the case that this test is reached with a positive value greater than then length. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:150: return; On 2016/01/25 14:13:54, sergei wrote: > Should we rather throw an exception? Never. Incrementing to "end()" is not an error.
IMO, TokenSequence has too specific implementation, so I would rather prefer to don't have it, at least in this commit. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... File src/plugin/TokenSequence.h (right): https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:26: typedef TokenSequenceConstIterator<T> ConstIterator; On 2016/01/26 15:54:19, Eric wrote: > When our style guide at https://adblockplus.org/en/coding-style has a specific > exemption for this, I'll gladly change it. I thought that it's natural to adhere to common iterator interface. Why do `cbegin` and `cend` start from lower letter and have "c"? I really don't want them to be renamed, but it would be definitely better to have public `const_iterator`. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:76: const Container& sequence; On 2016/01/26 15:54:20, Eric wrote: > On 2016/01/25 14:13:54, sergei wrote: > > I'm pretty sure it generates a warning something like "The compiler cannot > > generate copy ctr and assignment operator." > > I am completely sure it does not generate a warning, because I've compiled the > code. OK, our warning level does not generate a warning here, but it does not mean that we should not write a clean code. > > Could you explicitly make this class "uncopyable"? > > I would worry about this more if I were writing a proper library class. As it > is, I've now got multiple changes that depend on this code and I am resistant to > making changes that don't involve defects or other major problems. I think that these classes are quite generic and deserve a proper implementation. If it's so specific, why is it a template class and why is it not mentioned in the comments that it's not something which can be safely widely used? It's not clear what are these multiple changes that depend on this code. Anyway, it shouldn't be a reason that allows to commit dirty code. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:77: size_t posFirstCharInToken; On 2016/01/26 15:54:20, Eric wrote: > On 2016/01/25 14:13:54, sergei wrote: > > Wouldn't it better to name it as `tokenFirstCharPos[ition]` because it's a > > position of the first character of the token in the container. > > It's "position of the first character in the token". It's generally better to > put what kind of thing it is first in the name if you're dealing with an > inadequate type system. (Positions in strings should be their own type, IMO, but > they're not.) Let's say there is a string "aaa bbb ccc", after tokenization using space as a delimiter we get three tokens: "aaa", "bbb" and "ccc". The "position of the first character in token" is apparently always 1 (or zero if we are using zero-based numbering), however the position of the first character of each token in the container is different, for "aaa" it's 1, for "bbb" it's 5 and for "ccc" it's 9. I don't have a strong opinion regarding position of "pos" but the rest of the name is confusing. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:78: size_t posFirstCharNotInToken; On 2016/01/26 15:54:21, Eric wrote: > On 2016/01/25 14:13:53, sergei wrote: > > And this one is a position of the next delimiter, so something like > > `nextDelimiterPos`. > > Your suggested name is not as precise as the current one. > > These two variables are the endpoints of a half-closed, half-open interval of > positions that define a token, and their names reflect exactly that. That's clear that it's a right-open interval, BTW, you can explain it in the comments. You are right, the first character of not token is a delimiter or nothing (can be unknown character) but in my opinion during the reading of the code it's easier to consider the end of the string as a delimiter than read a little bit complicated variable name. It would be already better to name it `posNotTokenFirstChar`. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:101: : sequence(sequence), posFirstCharInToken(T::npos) On 2016/01/26 15:54:20, Eric wrote: > On 2016/01/25 14:13:54, sergei wrote: > > I feel myself uncomfortable when not all members without default constructor > are > > initialized. > > If you would like a comment about why 'posFirstCharNotInToken' is not > initialized, I can add one. It's a deliberate choice. The current code > initializes to a valid iterator state; that's all that matters. Don't forget about readability and maintainability. When I see that the member field is not initialized the first thing I think is that there is a bug and I need to check the implementation of the whole class. Here comparing `posFirstCharInToken` against `npos` might always tell us whether we may or not use other fields, but in general when I need to modify the class (e.g. add some method), again I need to very cautiously check that it will be in a consistent state and to do it I need dive into details of all algorithms with all their edge-cases. Off-topic: do you cross a street driving car on red light when there is nobody? Basically, it's safe and you achieve the goal, that's all that matters, isn't it? https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:108: void Advance(size_t posStart) On 2016/01/26 15:54:19, Eric wrote: > On 2016/01/25 14:13:54, sergei wrote: > > Wouldn't it be better to rename it to something like `FindNextToken` because > > `Advance` refers rather to a sequence of calls of `iterator::operator++` in > > context of iterators. > > It's better to name functions "what they do" than "how they work". In this case, > "Advance" is named relative to the constructor and 'operator++', meaning > "advance to the next token". "advance to the next token" actually means `find next token`, does not it? In general `Advance` is fine for me. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:110: posFirstCharInToken = sequence.value.find_first_not_of(sequence.delimiters, posStart); On 2016/01/26 15:54:19, Eric wrote: > On 2016/01/25 14:13:53, sergei wrote: > > That line "eats" all delimiters and skips empty tokens, that's very important > > moment. It should be reflected at least in the comment for `TokenSequence`, it > > also could be good to have it commented here. > > From my point of view, it's completely conventional to collapse multiple > whitespace delimiter characters into a single delimiter. Please don't use this > class with "comma" as a delimiter. Why not to use it with comma as a delimiter and how will one be able to understand it in a year? BTW, the actual limitation I see here, it won't work when a delimiter character consists from several elements (several bytes for UTF-8 or it's a surrogate pair for UTF-16). I have not commented it before and I don't think we should pay attention to it, because if we ever need such functionality we will pay attention to each method which works with the string. > > If this were a general purpose class, I'd worry about this more. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:144: return; // Incrementing "end()" yields "end()" On 2016/01/26 15:54:20, Eric wrote: > On 2016/01/25 14:13:52, sergei wrote: > > Should we throw an exception here instead? > > In fact, the way that this code is used in 'for' loops, this statement is never > actually reached in any code we'll use. The presence of this test is purely > defensive. It defensive but it protects silently, if we are trying to increase it then there is likely a bug and it would be better to be notified about it. I would personally use assert, but in this project we are extensively using pure exceptions for such cases. > > And to answer the question directly, no, because that would not be the > documented behavior, which is that incrementing "end()" yields "end()". > Thankfully, this is not a library function, so we don't need to debate what > interface would be better for general purpose use. IMO, incrementing of undereferenceable iterator is an error because it's not expected in C++. https://codereview.adblockplus.org/29331055/diff/29333226/src/plugin/TokenSeq... src/plugin/TokenSequence.h:149: posFirstCharInToken = T::npos; On 2016/01/26 15:54:19, Eric wrote: > On 2016/01/25 14:13:53, sergei wrote: > > I'm not sure that we actually need to make an assignment here, actually > > `posFirstCharNotInToken` cannot be bigger than `sequence.value.length()`, but > I > > don't object against `>=` in the clause above. > > The very simple answer is that we're incrementing from a non-"end()" iterator to > the "end()" iterator, and there's a single canonical value for "end()". That's > valuable because it makes 'operator==' much less complicated. Sorry, I have overlooked it, I have wrongly read `posFirstCharInToken` as `posFirstCharNotInToken`. Anyway, now I think that this assignment should be inside `Advance` method. > > If we didn't have this statement, loops on the iterator would never terminate. > > The class invariant is that if a token is present, both 'posFirstCharNotInToken' > and 'posFirstCharInToken' be string positions. See line 161 below for why. The > design decision here is that these position tests go in the state-change > functions. Access and predicate functions therefore do not perform such tests. > > Lines 117-120 above are a related test. Those fix up the 'npos' return value > into a proper length. > > Using ">=" is slightly clearer to me than using "==". It's never the case that > this test is reached with a positive value greater than then length.
> The performance overhead of stream buffering is rather large, particularly for > something as simple as delimiter-based parsing. This function is called once per > per element (roughly) in the traverser, so performance is at least somewhat a > concern. Let's look at the numbers. In my quick and dirty test 1000000 delimiter parsing operations from "foo bar baz" string took 4902ms using TokenSequence. On the other hand it took 4210ms using the std::getline method I suggested above. So this code introduces here ~150 lines of code to do something that would work faster with just 2 lines of code. Like I said, I really don't see why we need TokenSequence. Again, it is a beautiful class, but there is just no need for it. I definitely don't think this is a "fear of adding classes", but rather a desire to optimize for code maintainability.
On 2016/01/27 01:31:34, Oleksandr wrote: > Let's look at the numbers. In my quick and dirty test 1000000 delimiter parsing > operations from "foo bar baz" string took 4902ms using TokenSequence. On the > other hand it took 4210ms using the std::getline method I suggested above. So > this code introduces here ~150 lines of code to do something that would work > faster with just 2 lines of code. If you want to make an apples-to-apples comparison, you need to turn on the optimizer. The library function will have done that already. > I definitely don't think this > is a "fear of adding classes", but rather a desire to optimize for code > maintainability. Whenever you add classes, you are making a tradeoff: (a) cleaner code at the point of use vs. (b) complexity in the class definition. I wrote this class in the first place in order to _improve_ maintainability at the point of use, because that's where the maintenance problem was. The downside is needing to maintain 'TokenSequence'. So to make the tradeoff worthwhile, you need to make the new class good. 'TokenSequence' is self-contained, will never need an interface change (because it's purpose-made to support 'for' loops), easy to maintain (because once it's correct, it needs zero maintenance) (and because it seems to be correct already), and has unit tests (in case it's not already correct). In my view, the tradeoff favors the new class. The upshot: 1) I think you're wrong about maintainability here. 2) I give up. We've got more important things to work on. We are already using 'wsctok_s' for token parsing. See 'WBPassthruSink::GetContentTypeFromURL()', which, as I recall, replaces a previous use of 'CString::Tokenize()' and also which, as I recall, we bickered about similarly back then.
Patch set 7 removes 'TokenSequence' and replaces it with loops based on 'wcstok_s'.
LGTM
Beside a couple of nits LGTM. https://codereview.adblockplus.org/29331055/diff/29335758/src/plugin/PluginFi... File src/plugin/PluginFilter.cpp (right): https://codereview.adblockplus.org/29331055/diff/29335758/src/plugin/PluginFi... src/plugin/PluginFilter.cpp:79: //TODO: Add support for descendant selectors Redundant space is added to the end of the line. https://codereview.adblockplus.org/29331055/diff/29335758/src/shared/Utils.h File src/shared/Utils.h (right): https://codereview.adblockplus.org/29331055/diff/29335758/src/shared/Utils.h#... src/shared/Utils.h:85: text.erase(text.begin(), std::find_if(text.begin(), text.end(), isNotWhitespace<T::value_type>)); Does it compile? I expect "typename" before "T::value_type".
Patch set 8 moves one white space character in a comment. https://codereview.adblockplus.org/29331055/diff/29335758/src/shared/Utils.h File src/shared/Utils.h (right): https://codereview.adblockplus.org/29331055/diff/29335758/src/shared/Utils.h#... src/shared/Utils.h:85: text.erase(text.begin(), std::find_if(text.begin(), text.end(), isNotWhitespace<T::value_type>)); On 2016/05/02 14:33:48, sergei wrote: > Does it compile? I expect "typename" before "T::value_type". Of course it compiles. Do _you_ post code to review that doesn't compile? It has passed unit tests since the very first patch set. How would that have happened if it didn't compile? |