|
|
Patch Set 1 : x #
Total comments: 42
Patch Set 2 : rebase and address comments #
Total comments: 24
Patch Set 3 : address comments #
Total comments: 4
Patch Set 4 : simplify processing of query string #Patch Set 5 : rebase #
Total comments: 4
Patch Set 6 : fix and rebase #
MessagesTotal messages: 15
Since the content of http://codereview.adblockplus.org/5316782940225536/ has overlaps with the present review, I'd expect a rebased/reworked change set at some point in addition to whatever changes happen based on comments. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/plugin/PluginWbPassThrough.cpp:93: auto lastDotIt = std::find(value.rbegin(), value.rend(), L'.').base(); It's much clearer just to call 'rfind': auto lastDotPos = value.rfind(L'.'); http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/plugin/PluginWbPassThrough.cpp:175: { As it's already been mentioned in the comments, it's hazardous to try to derive a content type from the query string. https://issues.adblockplus.org/ticket/1115#comment:7 Since this is only a problem on IE 8, we should add a version check here. If the version is >= 9, we should always return here, even for 'contentTypeAny' http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/plugin/PluginWbPassThrough.h (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/plugin/PluginWbPassThrough.h:41: int GetContentType(const CString& mimeType, const std::wstring& domain, const std::wstring& src); If you're going to make a change that is just indentation, please convert it all to the standard indentation with spaces. But as I mentioned in my review yesterday, these functions don't even need to be declared in this class. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:157: } If you increment questionSignPos here, you get a (valid) position for the beginning of the query string. It eliminates the +1 and -1 in the return statement and makes the code a little easier to follow. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:166: void SplitString(const std::wstring& value, wchar_t delimiter, A better name would be ForEachToken(), by analogy with 'std::for_each'. When I see SplitString, I'm expecting something rather returns, say, a vector. I'd recommend changing the name of tokenHandler, which is pretty generic already to something completely generic, like 'fn' or 'op' (the first as in 'std::for_each', the second as in 'std::transform'). Perhaps even better, just make a token iterator class and call it with 'for_each' instead of the (just one single) call this function below. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:169: if (value.empty() || !tokenHandler) As above, 'tokenHandler' is a reference. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:172: } I would write the following loop as a 'for' loop. It makes termination clearer. for( size_t delimiterPos = 0 ; delimiterPos != std::wstring::npos ; ++delimiterPos ) { size_t prevDelimiterPos = delimiterPos; delimiterPos = value.find(delimiter, prevDelimiterPos); //... } http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:188: void ProcessQueryStringParameters(const std::wstring& queryString, Similiary, 'ForEachQueryParameter()' would be a better name. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:191: if (queryString.empty() || !parameterHandler) Why bother with the term "!parameterHandler" when you're passing 'parameterHandler' by reference? Under what circumstances would that term ever be true? http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:195: auto unparsedParameterHandler = [¶meterHandler](std::wstring::const_iterator begin, std::wstring::const_iterator end)->bool No need for a local variable here; this lambda can be defined directly as a function argument below. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.h (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.h:59: /// Splits the string using delimiter Might as well use a doxygen-style comment here. /** * Splits a string into tokens separated by a delimiter. * Applies a function to each token. */ http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.h:63: /// until it returns false. /** * Applies a function to each name=value pair in a query string. * Terminates early if the function returns false. */ http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilGetQueryString.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilGetQueryString.cpp:29: } Need a test for an empty query string L"schema://host/path1/path2?" http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilGetQueryString.cpp:34: } There are three tests with empty anchor strings. Need three more with non-empty anchor strings. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilProcessQueryStringParameters.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilProcessQueryStringParameters.cpp:26: ProcessQueryStringParameters(L"some=qyery&string", std::function<bool(const std::wstring&, const std::wstring&)>()); Use ASSERT_NO_THROW here to clearly indicate the intent of the test. Should probably be "query" instead of "qyery". http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilProcessQueryStringParameters.cpp:77: }); You've got some really horrible code duplication going on with this lambda. It would be far better to define a function class whose body called push_back on a vector of pairs. You've also got minor variations in the behavior. You could avoid code duplication there by having the test class provide a hook to its users that allowed changing the return value. bool operator()(const std::wstring& name, const std::wstring& value) { if ( !hook(parameters.size(), name, value) ) { return false; } parameters.push_back(make_pair(name,value)); return true; } For the hook functions: hookTrue(/*...*/) { return true; } // the default hookFalse(/*...*/) { return false; } hookOnlyOnce(/*...*/) { return count == 0; } hookLessThan3(/*...*/ { return count < 3; } http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilProcessQueryStringParameters.cpp:98: }); Since you're about to index a vector, you should first check that the vector actually has the requisite elements. ASSERT_GE(2, callCounter) EXPECT_EQ(2, callCounter) The first one terminates the test, preventing the reporting of spurious error from indexing non-existent vector elements. If it were throw on operator[], it would indicate a failure with the test implementation more than with the unit under test. The second one checks that you're have the number of elements expected. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilProcessQueryStringParameters.cpp:216: } You need a test for a query string with a non-empty name and an empty value, both with and without the '=' symbol. L"param=" L"param" You also need a test for a query string with an empty name and non-empty value. L"=valueonly" It's not a valid URL string, but you should ensure that it behaves as you want, either throwing an exception, returning an error, or simply initializing an empty name. At the very least, if it's not supposed to throw an exception, test that it doesn't. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilSplitString.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilSplitString.cpp:16: */ Many of the same comments from the previous file apply here. I don't feel the need to repeat myself in detail. As previous, many of these test would benefit from a common function object that initialized a vector. You have only one test for early termination of the iteration. Even that one function should be tested with other input strings. The iteration function in that test does not actually use the parameters provided. How about about a test string something like this: L"go&go&go&stop&EEK&EEK"
http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/plugin/PluginWbPassThrough.cpp:93: auto lastDotIt = std::find(value.rbegin(), value.rend(), L'.').base(); On 2015/02/02 18:41:58, Eric wrote: > It's much clearer just to call 'rfind': > > auto lastDotPos = value.rfind(L'.'); Good point! fixed http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/plugin/PluginWbPassThrough.cpp:175: { On 2015/02/02 18:41:58, Eric wrote: > As it's already been mentioned in the comments, it's hazardous to try to derive > a content type from the query string. > https://issues.adblockplus.org/ticket/1115#comment:7 > > Since this is only a problem on IE 8, we should add a version check here. If the > version is >= 9, we should always return here, even for 'contentTypeAny' Actually, I'm not sure that it's a problem of only IE8, the execution flow gets there in IE11 as well but I don't know concrete web site where it can derive the content type from the query string parameter. However, I also agree that it can be hazardous, so I've added it only for IE8. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:157: } On 2015/02/02 18:41:58, Eric wrote: > If you increment questionSignPos here, you get a (valid) position for the > beginning of the query string. It eliminates the +1 and -1 in the return > statement and makes the code a little easier to follow. done http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:166: void SplitString(const std::wstring& value, wchar_t delimiter, On 2015/02/02 18:41:58, Eric wrote: > A better name would be ForEachToken(), by analogy with 'std::for_each'. When I > see SplitString, I'm expecting something rather returns, say, a vector. done. > > I'd recommend changing the name of tokenHandler, which is pretty generic already > to something completely generic, like 'fn' or 'op' (the first as in > 'std::for_each', the second as in 'std::transform'). Why? I've changed them in the header but in the code they reduce the readability. > Perhaps even better, just make a token iterator class and call it with > 'for_each' instead of the (just one single) call this function below. I also very like the idea with token iterators but I don't think it's better here. It's harder to make a mistake working with `ForEachToken(src, delimiter, tokenHandler)` than with custom iterators. Let's do it later when we really need it. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:172: } On 2015/02/02 18:41:58, Eric wrote: > I would write the following loop as a 'for' loop. It makes termination clearer. > > for( size_t delimiterPos = 0 ; delimiterPos != std::wstring::npos ; > ++delimiterPos ) > { > size_t prevDelimiterPos = delimiterPos; > delimiterPos = value.find(delimiter, prevDelimiterPos); > //... > } Firstly, the proposed statements in the loop are not correct. Secondly, for-loop consists from the following steps, 'initialization', 'condition', 'increment' and 'statement' (`for(initialization, condition, increment){statement();}`). Taking into account that 'increment' is last three lines of the loop body as well as we can leave the loop not only because the 'condition' is false, I would say it's more logical to use while-loop here. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:188: void ProcessQueryStringParameters(const std::wstring& queryString, On 2015/02/02 18:41:58, Eric wrote: > Similiary, 'ForEachQueryParameter()' would be a better name. done. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:191: if (queryString.empty() || !parameterHandler) On 2015/02/02 18:41:58, Eric wrote: > Why bother with the term "!parameterHandler" when you're passing > 'parameterHandler' by reference? Under what circumstances would that term ever > be true? std::function<...> can be an empty object, so before calling it, we should ensure that it's not empty. Anyway, I've removed it here and in ForEachToken because it's actually covered by the test (see http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test...) http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:195: auto unparsedParameterHandler = [¶meterHandler](std::wstring::const_iterator begin, std::wstring::const_iterator end)->bool On 2015/02/02 18:41:58, Eric wrote: > No need for a local variable here; this lambda can be defined directly as a > function argument below. done http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.h (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.h:59: /// Splits the string using delimiter On 2015/02/02 18:41:58, Eric wrote: > Might as well use a doxygen-style comment here. > > /** > * Splits a string into tokens separated by a delimiter. > * Applies a function to each token. > */ fixed. BTW, it's not necessary to have additional lines /** and leading */, doxygen should perfectly work with /// as well. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.h:63: /// until it returns false. On 2015/02/02 18:41:58, Eric wrote: > /** > * Applies a function to each name=value pair in a query string. > * Terminates early if the function returns false. > */ done http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilGetQueryString.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilGetQueryString.cpp:29: } On 2015/02/02 18:41:58, Eric wrote: > Need a test for an empty query string > > L"schema://host/path1/path2?" added http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilGetQueryString.cpp:34: } On 2015/02/02 18:41:58, Eric wrote: > There are three tests with empty anchor strings. Need three more with non-empty > anchor strings. I disagree here. According to such logic we should create the infinite number of tests to test absolutely all cases which we cannot achieve. Now we cover all execution paths of the tested function body, it should be enough. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilProcessQueryStringParameters.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilProcessQueryStringParameters.cpp:26: ProcessQueryStringParameters(L"some=qyery&string", std::function<bool(const std::wstring&, const std::wstring&)>()); On 2015/02/02 18:41:58, Eric wrote: > Use ASSERT_NO_THROW here to clearly indicate the intent of the test. I've changed it, now it throws and there is ASSERT_THROW http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilProcessQueryStringParameters.cpp:216: } On 2015/02/02 18:41:58, Eric wrote: > You need a test for a query string with a non-empty name and an empty value, > both with and without the '=' symbol. > > L"param=" > L"param" > It's already here, ParameterWithoutAssignSign, ParametersWithoutAssignSign, ParameterWithEmptyValue. > You also need a test for a query string with an empty name and non-empty value. > > L"=valueonly" > > It's not a valid URL string, but you should ensure that it behaves as you want, > either throwing an exception, returning an error, or simply initializing an > empty name. At the very least, if it's not supposed to throw an exception, test > that it doesn't. I've added such tests. JIC, I want to clarify, we should not throw any exception on unusual user input.
http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:172: } On 2015/02/12 14:44:06, sergei wrote: > Firstly, the proposed statements in the loop are not correct. Look more closely. > Taking into account that 'increment' is last three lines of the loop body as > well as we can leave the loop not only because the 'condition' is false, I would > say it's more logical to use while-loop here. Providing a second condition to leave the loop is part of the point. The critique you are making, that there are two ways of terminating the loop, is also present in the existing code. There's termination if a position reaches the end or termination if the function argument returns true. There's no difference there. The advantage of the for-loop form is that you don't have to read the body of the loop to verify that the loop terminates. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:191: if (queryString.empty() || !parameterHandler) On 2015/02/12 14:44:06, sergei wrote: > std::function<...> can be an empty object, so before calling it, we should > ensure that it's not empty. I had forgotten that you can have a non-empty reference and also a null function, so this condition is actually meaningful. Any iteration with a null function is the same as an empty statement. So I'm withdrawing my original objection. The original behavior, to do nothing if the function provide is null, is better behavior. > Anyway, I've removed it here and in ForEachToken because it's actually covered > by the test This works too, though now that I've seen both I prefer the original. If you want to keep the condition out (as with patch set 2), it deserves a document comment in the header file that this function throws if the function argument is null. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.h (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.h:59: /// Splits the string using delimiter On 2015/02/12 14:44:06, sergei wrote: > BTW, it's not necessary to have additional lines /** and leading */, doxygen > should perfectly work with /// as well. Minimizing whitespace is not always a virtue. I find the /**-style doxygen comment much easier to read. That's what you've done, though, so I'm fine. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilGetQueryString.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilGetQueryString.cpp:34: } On 2015/02/12 14:44:06, sergei wrote: > I disagree here. According to such logic we should create the infinite number of > tests to test absolutely all cases which we cannot achieve. No, not an infinite number of tests. Simply a few more tests here. > Now we cover all > execution paths of the tested function body, it should be enough. Unit tests should treat their units as black boxes, not white boxes. You can't see inside a black box; you see inside a white box. Arguing that all execution paths are exercised is contrary to this principle. Unit tests should be written so that someone else implementing the same function has the benefit of exhaustive testing. I've written lots of parsers in my life. It is particularly useful for parsers to test exhaustively, for some reason. When testing a parser, it's important to test as many variations in the grammar as possible. In the present case, if it's possible for some element of a parsed string to be empty, there should be unit tests when it's empty. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilProcessQueryStringParameters.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilProcessQueryStringParameters.cpp:216: } On 2015/02/12 14:44:06, sergei wrote: > JIC, I want to clarify, we should not throw any exception on unusual user input. This comment belongs as documentation as a function header. Sloppy error handling in parsing user input is a major source of security defects. It matters less how you do error handling than that it be done explicitly and precisely. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:185: contentType == ContentType::CONTENT_TYPE_OTHER) I don't think it matters all that much, but it would be a little better to test content type first and check the IE version only if we need to. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/shared/Utils.cpp:175: } This version much clearer and easier to verify correctness, including better local variable names. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... File test/UtilForEachQueryStringParameterTest.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:24: struct ParameterHandler The code reads much better with this class present. Now that I've seen the new version, I'll just observe that there's lot of duplication in verifying the resulting vectors. For whatever reason, it bothers me less. That's partly because our current compiler doesn't support C++11 uniform initializer lists, so writing a single expression for expected results isn't possible. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:43: struct LimitedParameterHandler : ParameterHandler I think I'd prefer than ParameterHandler take the moral equivalent of shouldContinue() here as a constructor argument. My reason for this is that the code here only applies to a single test case and so the code for that test is split between two, separate, non-contiguous types. I'd be satisfied, though, if you declared this call immediately before the unit test in which it's used. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:52: return shouldContinue() && ParameterHandler::operator()(parameter, value) && shouldContinue(); No need for the first shouldContinue() call here. The only time it could matter in the present case is when 'limit' is zero. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:72: ForEachQueryStringParameter(L"", std::ref(parameterHandler)); Do we need reference wrappers? It seems like would should be able to declare the ForEach... function arguments as rvalue references rather than regular references. If we do, we should likely use 'std::cref' here.
http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:187: ForEachQueryStringParameter(GetQueryString(src), Why can't we just use strtok here? The code would be much simpler, IMHO. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/shared/Utils.cpp:156: schemeAndHierarchicalPartEndsAt = url.find(L'#'); So what if there's neither '?' nor '#' in the url? Even more, why not just use CreateUri (https://msdn.microsoft.com/en-us/library/ie/ms775098(v=vs.85).aspx) and then just IUri::GetQuery? Or maybe WinHttpCrackUrl or InternetCrackUrl. For this and GetQueryString below. https://msdn.microsoft.com/en-us/library/windows/desktop/aa384092%28v=vs.85%2... or https://msdn.microsoft.com/en-us/library/windows/desktop/aa384376(v=vs.85).aspx
http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:172: } On 2015/02/12 17:24:41, Eric wrote: > On 2015/02/12 14:44:06, sergei wrote: > > Firstly, the proposed statements in the loop are not correct. > > Look more closely. It results in an infinite cycle. Look, at the end `delimiterPos = value.find(delimiter, prevDelimiterPos);` sets `delimiterPos` to std::string::npos which is the greatest possible value for an element of type size_t and then 'increment' step of for-loop is executed, (`++delimiterPos`) which results into setting `delimiterPos` value to zero (actually, integer overflow is undefined behaviour), and it starts from the beginning. So, I still cannot say that the proposed approach is easier to understand. > > > Taking into account that 'increment' is last three lines of the loop body as > > well as we can leave the loop not only because the 'condition' is false, I > would > > say it's more logical to use while-loop here. > > Providing a second condition to leave the loop is part of the point. > > The critique you are making, that there are two ways of terminating the loop, is > also present in the existing code. There's termination if a position reaches the > end or termination if the function argument returns true. There's no difference > there. > > The advantage of the for-loop form is that you don't have to read the body of > the loop to verify that the loop terminates. I agree here, I personally don't like to have exit points in the body of any loop as well as several exist points from the function, except the cases when we return the error. I usually create additional boolean variable which indicates whether the loop should stop or not and the function returns the correct value only at one place. But as I got feeling that it's considered as something bad here. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:191: if (queryString.empty() || !parameterHandler) On 2015/02/12 17:24:41, Eric wrote: > On 2015/02/12 14:44:06, sergei wrote: > > std::function<...> can be an empty object, so before calling it, we should > > ensure that it's not empty. > > I had forgotten that you can have a non-empty reference and also a null > function, so this condition is actually meaningful. > > Any iteration with a null function is the same as an empty statement. So I'm > withdrawing my original objection. The original behavior, to do nothing if the > function provide is null, is better behavior. > > > Anyway, I've removed it here and in ForEachToken because it's actually covered > > by the test > > This works too, though now that I've seen both I prefer the original. If you > want to keep the condition out (as with patch set 2), it deserves a document > comment in the header file that this function throws if the function argument is > null. I've added the comments. http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... File test/UtilGetQueryString.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/test... test/UtilGetQueryString.cpp:34: } On 2015/02/12 17:24:41, Eric wrote: > On 2015/02/12 14:44:06, sergei wrote: > > I disagree here. According to such logic we should create the infinite number > of > > tests to test absolutely all cases which we cannot achieve. > > No, not an infinite number of tests. Simply a few more tests here. > > > Now we cover all > > execution paths of the tested function body, it should be enough. > > Unit tests should treat their units as black boxes, not white boxes. You can't > see inside a black box; you see inside a white box. Arguing that all execution > paths are exercised is contrary to this principle. > > Unit tests should be written so that someone else implementing the same function > has the benefit of exhaustive testing. I've written lots of parsers in my life. > It is particularly useful for parsers to test exhaustively, for some reason. > When testing a parser, it's important to test as many variations in the grammar > as possible. In the present case, if it's possible for some element of a parsed > string to be empty, there should be unit tests when it's empty. Added. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:185: contentType == ContentType::CONTENT_TYPE_OTHER) On 2015/02/12 17:24:41, Eric wrote: > I don't think it matters all that much, but it would be a little better to test > content type first and check the IE version only if we need to. Changed. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:187: ForEachQueryStringParameter(GetQueryString(src), On 2015/02/13 03:32:16, Oleksandr wrote: > Why can't we just use strtok here? The code would be much simpler, IMHO. strtok is unsafe and I would say the using of any function from the family is not simpler, https://msdn.microsoft.com/en-us/library/ftsafwz3.aspx http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/shared/Utils.cpp:156: schemeAndHierarchicalPartEndsAt = url.find(L'#'); On 2015/02/13 03:32:16, Oleksandr wrote: > So what if there's neither '?' nor '#' in the url? It means there is neither query nor fragment and we will return the original URL. Actually, there is the test for that case. > Even more, why not just use CreateUri > (https://msdn.microsoft.com/en-us/library/ie/ms775098%28v=vs.85%29.aspx) and then > just IUri::GetQuery? Or maybe WinHttpCrackUrl or InternetCrackUrl. For this and > GetQueryString below. > https://msdn.microsoft.com/en-us/library/windows/desktop/aa384092%2528v=vs.85... > or > https://msdn.microsoft.com/en-us/library/windows/desktop/aa384376%28v=vs.85%2... I thought that it's an unnecessary overhead, especially I'm afraid that to parse the URL they can instantiate a protocol handler (BTW, our implementation of protocol handler) to use IInternetProtocolInfo::ParseUrl. At least I've seen such behaviour but I don't remember whether it was for one of the mentioned function or not. CreateUri looks fine for me, should I investigate it and change? http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... File test/UtilForEachQueryStringParameterTest.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:24: struct ParameterHandler On 2015/02/12 17:24:41, Eric wrote: > The code reads much better with this class present. > > Now that I've seen the new version, I'll just observe that there's lot of > duplication in verifying the resulting vectors. For whatever reason, it bothers > me less. > > That's partly because our current compiler doesn't support C++11 uniform > initializer lists, so writing a single expression for expected results isn't > possible. Yes, it's not possible due to compiler limitation, in addition, having verification statements in the body of test (not in the common function) can be useful because in that case we immediately know the exact location of the failure. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:43: struct LimitedParameterHandler : ParameterHandler On 2015/02/12 17:24:41, Eric wrote: > I think I'd prefer than ParameterHandler take the moral equivalent of > shouldContinue() here as a constructor argument. My reason for this is that the > code here only applies to a single test case and so the code for that test is > split between two, separate, non-contiguous types. > > I'd be satisfied, though, if you declared this call immediately before the unit > test in which it's used. I thought about it when I was creating of it and decided that it would be better to have it as separate class. It's moved before the usage. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:52: return shouldContinue() && ParameterHandler::operator()(parameter, value) && shouldContinue(); On 2015/02/12 17:24:41, Eric wrote: > No need for the first shouldContinue() call here. The only time it could matter > in the present case is when 'limit' is zero. I've added it with another intention but it's fine without it. Removed. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:72: ForEachQueryStringParameter(L"", std::ref(parameterHandler)); On 2015/02/12 17:24:41, Eric wrote: > Do we need reference wrappers? It seems like would should be able to declare the > ForEach... function arguments as rvalue references rather than regular > references. > > If we do, we should likely use 'std::cref' here. The tested function accepts `std::function` but we are passing an object with the `operator()`. When `std::function` is implicitly constructed from it, it copies the passed object, in that case the tests fail because the tested function will populate the temporary object held by `std::function`. To avoid such copying one can pass std::reference_wrapper or use some another technique, like put `ParameterHandler` members into shared object or use lambda which captures it by reference. I guess, `std::reference_wrapper` is easier here. We cannot use std::cref because the object is not constant, we need call ParameterHandler::operator()(...) which has no `const` modifier. An attempt to use `std::cref` results into compilation error.
LGTM. There are some small ways of improving the code in this change set, but nothing that I think must be done beforehand. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:187: ForEachQueryStringParameter(GetQueryString(src), On 2015/02/13 03:32:16, Oleksandr wrote: > Why can't we just use strtok here? The code would be much simpler, IMHO. For the reasons that strtok won't work here, see http://stackoverflow.com/questions/4031075/strtok-function-thread-safety Microsoft doesn't supply any variation of strtok_r, which might work. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... File test/UtilForEachQueryStringParameterTest.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:24: struct ParameterHandler On 2015/02/13 15:21:56, sergei wrote: > having > verification statements in the body of test (not in the common function) can be > useful because in that case we immediately know the exact location of the > failure. I've written such things, and you can get the location by how you write the common verification function. On the other hand, I've used lots of unit test frameworks but I've not needed to do this in google-test, so I'm not sure what the details and limitations are for the package we're using... http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:43: struct LimitedParameterHandler : ParameterHandler On 2015/02/13 15:21:56, sergei wrote: > I thought about it when I was creating of it and decided that it would be better > to have it as separate class. If we had more variation in the tests, subclassing each time would become problematic. With just one variation, it's fine. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/test... test/UtilForEachQueryStringParameterTest.cpp:72: ForEachQueryStringParameter(L"", std::ref(parameterHandler)); On 2015/02/13 15:21:56, sergei wrote: > The tested function accepts `std::function` but we are passing an object with > the `operator()`. You may have misunderstood. I was suggesting declaring void ForEachQueryStringParameter(..., const std::function<...>&& parameterHandler) instead of void ForEachQueryStringParameter(..., const std::function<...>&& parameterHandler) > To avoid > such copying one can pass std::reference_wrapper or use some another technique, > like put `ParameterHandler` members into shared object or use lambda which > captures it by reference. My suggestion boils down to avoiding copy by replacing it with a move. > We cannot use std::cref because the object is not constant, we need call > ParameterHandler::operator()(...) which has no `const` modifier. An attempt to > use `std::cref` results into compilation error. I'm misunderstanding std::cref, I guess. I would have called it std::pure_ref if that's its actual behavior. http://codereview.adblockplus.org/5921969115496448/diff/5728116278296576/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5728116278296576/src/... src/shared/Utils.cpp:178: const std::function<bool(std::wstring::const_iterator begin, std::wstring::const_iterator end)>& tokenHandler) You might change "tokenHandler" to "fn" here. Or at least "tokenFn". This is not a strong opinion. http://codereview.adblockplus.org/5921969115496448/diff/5728116278296576/src/... src/shared/Utils.cpp:200: const std::function<bool(const std::wstring& name, const std::wstring& value)>& parameterHandler) See comment above. http://codereview.adblockplus.org/5921969115496448/diff/5728116278296576/test... File test/UtilForEachQueryStringParameterTest.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5728116278296576/test... test/UtilForEachQueryStringParameterTest.cpp:160: } parameterHandler(2); If you'd like to have the subclass definition inside the function body for a single use, it would be clearer to eliminate the constructor argument and initialize with the clause "limit(2)". http://codereview.adblockplus.org/5921969115496448/diff/5728116278296576/test... File test/UtilGetQueryStringTest.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5728116278296576/test... test/UtilGetQueryStringTest.cpp:38: // 1 1 1 1 1 Ah. This is a good comment. Exhaustive testing makes me very pleased.
http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5668600916475904/src/... src/shared/Utils.cpp:172: } While I think the loop could be improved, it does function correctly and there's no need to belabor the point.
http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:187: ForEachQueryStringParameter(GetQueryString(src), strotok_s would actually work good here. It has a context parameter instead of static buffer used in strtok. In fact, strotok in MSVC is thread safe too: https://social.msdn.microsoft.com/Forums/vstudio/en-US/963aac07-da1a-4612-be4... So strotok_s is thread safe and reentrant. With that we won't need all the new methods and test files and current code could look something like [NOT TESTED]: std::wstring queryString = GetQueryString(src); contentType = ContentType::CONTENT_TYPE_OTHER; token = wcstok_s(src, L"&=", &next_token); while (token != nullptr) { contentType = GetContentTypeFromString(token); if (contentType != ContentType::CONTENT_TYPE_OTHER) { return contentType; } token = wcstok_s(NULL, L"&=", &next_token); } This would be much more straightforward and better than the current approach IMO, which takes a bit time to comprehend. http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/shared/Utils.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/shared/Utils.cpp:156: schemeAndHierarchicalPartEndsAt = url.find(L'#'); We are using WinHTTPCrackUrl in libadblockplus in DedfaultWebRequestWinInet.cpp, FYI. I don't think that instantiates our APP. But at this stage, I guess there is little point to look into CreateUrl and others since we already have a good enough solution with tests.
http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:187: ForEachQueryStringParameter(GetQueryString(src), On 2015/02/16 06:59:28, Oleksandr wrote: > This would be much more straightforward and better than the current approach > IMO, which takes a bit time to comprehend. I agree it looks good, not as I expected.
http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:187: ForEachQueryStringParameter(GetQueryString(src), There are two underlying observations here. The first is that there's a library function, albeit a non-standard Microsoft-specific one, that can split tokens. The second is that, _for this particular function_, there's no need to distinguish between name and value in a URL parameter string. Having seen the new code, I greatly prefer the use of a function iterator and an inline lambda. It find it easier to understand that the new commingled version. Separating syntax and semantics is almost always the right thing to do, in no small part because large numbers of security problems come from bad parsing (the syntax part), and because it's much easier to validate parsing code when it's not commingled with other concerns. However, the previous version of the code could have been simpler. The particular algorithm used makes no distinction between name and value. Hence it could have been written with 'ForEachToken' rather than 'ForEachQueryStringParameter'. In that case the body of the lambda would be much smaller, something like this: if (token.empty()) return true; contentType = GetContentTypeFromString(token); return contentType == ContentType::CONTENT_TYPE_OTHER; The other part, that about 'strtok_s', is a viable way of rewriting 'ForEachToken'. I wasn't fond of its previous implementation, although it was adequate. Using the library function would clean it up.
http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... File src/plugin/PluginWbPassThrough.cpp (right): http://codereview.adblockplus.org/5921969115496448/diff/5766466041282560/src/... src/plugin/PluginWbPassThrough.cpp:187: ForEachQueryStringParameter(GetQueryString(src), On 2015/02/16 18:18:07, Eric wrote: > There are two underlying observations here. The first is that there's a library > function, albeit a non-standard Microsoft-specific one, that can split tokens. > The second is that, _for this particular function_, there's no need to > distinguish between name and value in a URL parameter string. Good, but what does it give us? We use a lot of Windows API, yet one Microsoft-specific should not cause any more trouble. And it looks reasonable to start with the value and then try with the name but it's too premature assumption, so I've removed it without any comments. > > Having seen the new code, I greatly prefer the use of a function iterator and an > inline lambda. It find it easier to understand that the new commingled version. > Separating syntax and semantics is almost always the right thing to do, in no > small part because large numbers of security problems come from bad parsing (the > syntax part), and because it's much easier to validate parsing code when it's > not commingled with other concerns. I completely support it! > > However, the previous version of the code could have been simpler. The > particular algorithm used makes no distinction between name and value. Hence it > could have been written with 'ForEachToken' rather than > 'ForEachQueryStringParameter'. In that case the body of the lambda would be much > smaller, something like this: > > if (token.empty()) return true; > contentType = GetContentTypeFromString(token); > return contentType == ContentType::CONTENT_TYPE_OTHER; > > The other part, that about 'strtok_s', is a viable way of rewriting > 'ForEachToken'. I wasn't fond of its previous implementation, although it was > adequate. Using the library function would clean it up. Actually, I find the previous implementation a bit more complicated than the current because the previous one requires at least to read the description of the function to understand the aim of the return value of the lambda. It's a design flaw, it's better to use enum instead of bool there, but all these things only complicate it, I guess. I would not say that the using of any function from `strtok` family is easier and is good for functions like `ForEachToken`. Pay attention that we pass "immutable" object to `ForEachToken` (const string&) but `strtok` modifies it. In this particular case it's clear and we are modifying the temporary object, but, as I said, I find all such peculiarities only complicating the code. In addition, the previous implementation of `ForEachToken` works with iterators which is safer than working with the raw pointers. So, I would like to keep the current version (with `strtok`) but if you want it's completely fine for me to implement `ForEachToken` which works with a set of delimiters (but not based on `strtok`).
With the 2 nits addressed - LGTM http://codereview.adblockplus.org/5921969115496448/diff/5636470266134528/src/... File src/shared/Utils.h (right): http://codereview.adblockplus.org/5921969115496448/diff/5636470266134528/src/... src/shared/Utils.h:62: * Returns the begging of the URL which includes the scheme and hierarchical beginning http://codereview.adblockplus.org/5921969115496448/diff/5636470266134528/src/... src/shared/Utils.h:63: * part according to http://en.wikipedia.org/wiki/URI_scheme . No space before '.'
http://codereview.adblockplus.org/5921969115496448/diff/5636470266134528/src/... File src/shared/Utils.h (right): http://codereview.adblockplus.org/5921969115496448/diff/5636470266134528/src/... src/shared/Utils.h:62: * Returns the begging of the URL which includes the scheme and hierarchical On 2015/03/03 18:43:17, Oleksandr wrote: > beginning fixed. http://codereview.adblockplus.org/5921969115496448/diff/5636470266134528/src/... src/shared/Utils.h:63: * part according to http://en.wikipedia.org/wiki/URI_scheme . On 2015/03/03 18:43:17, Oleksandr wrote: > No space before '.' fixed.
I didn't think this needed another LGTM. Here it goes. |