|
|
Created:
Oct. 10, 2017, 2:18 p.m. by Wladimir Palant Modified:
Dec. 20, 2017, 9:21 a.m. CC:
Felix Dahlke Base URL:
https://hg.adblockplus.org/adblockpluscore Visibility:
Public. |
DescriptionThis makes non-string key types in our hash tables possible, which is required if we want an int-keyed hash table for Rabin-Karp algorithm.
Patch Set 1 : #
Total comments: 5
Patch Set 2 : Exposed superclass constructors properly #Patch Set 3 : Use constructor inheritance #
Total comments: 29
Patch Set 4 : Addressed review comments #Patch Set 5 : Removed redundant template parameter for Map class as well #Patch Set 6 : Fixed key initialization issue #
Total comments: 9
Patch Set 7 : Introduced key_type_cref #
Total comments: 3
Patch Set 8 : Addressed remaining nits #
MessagesTotal messages: 17
This makes non-string key types in our hash tables possible, which is required if we want an int-keyed hash table for Rabin-Karp algorithm. https://codereview.adblockplus.org/29572731/diff/29572748/compiled/IntMap.h File compiled/IntMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572748/compiled/IntMap.h#n... compiled/IntMap.h:111: }; This class hasn't been tested yet beyond the fact that it compiles. https://codereview.adblockplus.org/29572731/diff/29572748/compiled/Map.h File compiled/Map.h (right): https://codereview.adblockplus.org/29572731/diff/29572748/compiled/Map.h#newc... compiled/Map.h:43: if (mPos != mEnd && mPos->is_invalid()) I chose to implement operations working on keys within the Entry structure instead of supplying separate functors. https://codereview.adblockplus.org/29572731/diff/29572748/compiled/Map.h#newc... compiled/Map.h:233: struct MapReference : public HashContainerReference<Entry> Logically, this class should be defined further above, right after HashContainerReference. I didn't move it to minimize the diff however. https://codereview.adblockplus.org/29572731/diff/29572748/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572748/compiled/StringMap.... compiled/StringMap.h:94: second = value_type(); This is a logic change. Previously, StringMap::erase() would merely invalidate the key but not remove the value. This meant that a filter pointer would keep a reference to a filter even after the entry was removed.
I realized that only the default constructor without parameters was exposed on StringSet, this issue existed before my changes already. I now made sure that HashContainer constructors are properly exposed in its subclasses.
This can be made a bit nicer by using constructor inheritance introduced in C++11.
https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h File compiled/IntMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:33: static const int KEY_DELETED = -2; Do these two const need to be public?
Merely code style comments, it seems there are not defects, though I have not tried it. https://codereview.adblockplus.org/29572731/diff/29572748/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572748/compiled/StringMap.... compiled/StringMap.h:94: second = value_type(); On 2017/10/10 14:33:59, Wladimir Palant wrote: > This is a logic change. Previously, StringMap::erase() would merely invalidate > the key but not remove the value. This meant that a filter pointer would keep a > reference to a filter even after the entry was removed. The new version looks better. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h File compiled/IntMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:28: typedef int key_type; can it be more specific, e.g. uint32_t or int32_t or be a template parameter? It might also affect the name of the final class, even if there is a generic container with integer keys, the final one would be UInt32Map. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:31: // The assumption here is that valid keys will always be non-negative I would remove these comment, having a couple of reserved values are better then removing the half of the possible values range. This comment would make sense if the implementation of is_invalid be like return key < 0;. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:32: static const int KEY_INVALID = -1; the type should be key_type, not int. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:35: int first; it should be key_type https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:74: struct IntMapEntry : IntSetEntry Maybe it's a premature generalization but I would make a MapEntry generic and below say template<typename Value> using IntMap = Map<Map_internal::MapEntry<Map_internal::IntSetEntry, Value>>; or taking into account the type of integer using UInt32Map = Map<Map_internal::MapEntry<Map_internal::IntSetEntry<uint32_t>, Value>>; https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h File compiled/Map.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:233: struct MapReference : public HashContainerReference<Entry> I would remove Value template parameter because it's already present in the Entry. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:236: typedef typename super::entry_type entry_type; entry_type is already defined in the public base class. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:278: typedef typename super::key_type key_type; Why is it required to define again size_type, entry_type and key_type which are already defined in the publicly base class? The same for Map class. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:292: typedef typename super::const_reference const_reference; const_reference is already defined in the publicly base class. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:313: const_reference find(const key_type& key) const It seems this method is not needed because it's available in the base class. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/StringMap.... compiled/StringMap.h:87: : StringSetEntry(key), second(value) it seems anyway we require key_type above and value_type here to be default constructible, so there can be only one constructor for StringSetEntry, IntSetEntry, StringMapEntry and IntMapEntry, with default parameter values.
https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h File compiled/IntMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:74: struct IntMapEntry : IntSetEntry On 2017/10/11 10:03:21, sergei wrote: > Maybe it's a premature generalization but I would make a MapEntry generic and > below say > template<typename Value> > using IntMap = Map<Map_internal::MapEntry<Map_internal::IntSetEntry, Value>>; > or taking into account the type of integer > using UInt32Map = > Map<Map_internal::MapEntry<Map_internal::IntSetEntry<uint32_t>, Value>>; IntSetEntry is of course from IntMap_internal.
https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h File compiled/IntMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:28: typedef int key_type; On 2017/10/11 10:03:21, sergei wrote: > can it be more specific, e.g. uint32_t or int32_t or be a template parameter? It > might also affect the name of the final class, even if there is a generic > container with integer keys, the final one would be UInt32Map. Done. Not going to add any template parameters however - this would complicate the choice of sentinel values, not worth it at the moment. For reference, I chose int32_t here originally because I assume signed operations to map better to JavaScript code. However, this is premature optimization. So I changed it into uint32_t which is more natural here, we can still test the performance and optimize once that code path is actually used. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:31: // The assumption here is that valid keys will always be non-negative On 2017/10/11 10:03:21, sergei wrote: > I would remove these comment, having a couple of reserved values are better then > removing the half of the possible values range. This comment would make sense if > the implementation of is_invalid be like return key < 0;. Done. However, we need sentinel values at the edge of the valid range then. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:32: static const int KEY_INVALID = -1; On 2017/10/11 10:03:21, sergei wrote: > the type should be key_type, not int. Done. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:33: static const int KEY_DELETED = -2; On 2017/10/10 20:37:53, hub wrote: > Do these two const need to be public? Done. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:35: int first; On 2017/10/11 10:03:21, sergei wrote: > it should be key_type Done. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:74: struct IntMapEntry : IntSetEntry On 2017/10/11 10:03:21, sergei wrote: > Maybe it's a premature generalization but I would make a MapEntry generic and > below say > template<typename Value> > using IntMap = Map<Map_internal::MapEntry<Map_internal::IntSetEntry, Value>>; Yes, probably premature generalization. Trouble is, strings are passed by reference, integers are not. While generalizing and passing integers by reference to constructors isn't the worst thing that can happen, I'd rather not. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h File compiled/Map.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:233: struct MapReference : public HashContainerReference<Entry> On 2017/10/11 10:03:22, sergei wrote: > I would remove Value template parameter because it's already present in the > Entry. Done. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:236: typedef typename super::entry_type entry_type; On 2017/10/11 10:03:21, sergei wrote: > entry_type is already defined in the public base class. Yes. However, using it in this class will produce compile errors if we don't redeclare it. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:278: typedef typename super::key_type key_type; On 2017/10/11 10:03:22, sergei wrote: > Why is it required to define again size_type, entry_type and key_type which are > already defined in the publicly base class? > The same for Map class. Done here. For Map class they are actually required, we'll get compile errors otherswise. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:292: typedef typename super::const_reference const_reference; On 2017/10/11 10:03:22, sergei wrote: > const_reference is already defined in the publicly base class. It's used in this class, we'll get compile error unless we redeclare it. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:313: const_reference find(const key_type& key) const On 2017/10/11 10:03:22, sergei wrote: > It seems this method is not needed because it's available in the base class. We declare a non-const variant of this method below. If we don't redeclare the const variant, it won't be exposed any more. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/StringMap.... compiled/StringMap.h:87: : StringSetEntry(key), second(value) On 2017/10/11 10:03:22, sergei wrote: > it seems anyway we require key_type above and value_type here to be default > constructible, so there can be only one constructor for StringSetEntry, > IntSetEntry, StringMapEntry and IntMapEntry, with default parameter values. Not really trivial with string types but done.
LGTM
Now that I have actual code using this, I noticed that the keys weren't initialized correctly - default key should always be invalid rather than zero. Fixed this.
LGTM, however I think we should pay attention to the comments in compiled/StringMap.h. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h File compiled/IntMap.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:28: typedef int key_type; On 2017/10/11 18:28:30, Wladimir Palant wrote: > On 2017/10/11 10:03:21, sergei wrote: > > can it be more specific, e.g. uint32_t or int32_t or be a template parameter? > It > > might also affect the name of the final class, even if there is a generic > > container with integer keys, the final one would be UInt32Map. > > Done. Not going to add any template parameters however - this would complicate > the choice of sentinel values, not worth it at the moment. > > For reference, I chose int32_t here originally because I assume signed > operations to map better to JavaScript code. However, this is premature > optimization. So I changed it into uint32_t which is more natural here, we can > still test the performance and optimize once that code path is actually used. Acknowledged. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n... compiled/IntMap.h:74: struct IntMapEntry : IntSetEntry On 2017/10/11 18:28:30, Wladimir Palant wrote: > On 2017/10/11 10:03:21, sergei wrote: > > Maybe it's a premature generalization but I would make a MapEntry generic and > > below say > > template<typename Value> > > using IntMap = Map<Map_internal::MapEntry<Map_internal::IntSetEntry, Value>>; > > Yes, probably premature generalization. Trouble is, strings are passed by > reference, integers are not. While generalizing and passing integers by > reference to constructors isn't the worst thing that can happen, I'd rather not. Acknowledged. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h File compiled/Map.h (right): https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:236: typedef typename super::entry_type entry_type; On 2017/10/11 18:28:31, Wladimir Palant wrote: > On 2017/10/11 10:03:21, sergei wrote: > > entry_type is already defined in the public base class. > > Yes. However, using it in this class will produce compile errors if we don't > redeclare it. Acknowledged. Interesting whether there is a difference between typedef typename super::entry_type entry_type; and using typename super::entry_type. https://codereview.adblockplus.org/29572731/diff/29572868/compiled/Map.h#newc... compiled/Map.h:313: const_reference find(const key_type& key) const On 2017/10/11 18:28:31, Wladimir Palant wrote: > On 2017/10/11 10:03:22, sergei wrote: > > It seems this method is not needed because it's available in the base class. > > We declare a non-const variant of this method below. If we don't redeclare the > const variant, it won't be exposed any more. What about `using super::find;`, though it should be not in this commit. https://codereview.adblockplus.org/29572731/diff/29581584/compiled/Map.h File compiled/Map.h (right): https://codereview.adblockplus.org/29572731/diff/29581584/compiled/Map.h#newc... compiled/Map.h:278: }; This version is fine, I just think it could be a bit shorter template<typename Entry> using Set = Map_internal::HashContainer<Entry>; https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.... compiled/StringMap.h:32: DependentString first; Actually, the type of `first` should be key_type, thus key_type should be DependentString. What do you think about having typedef const String& key_type_cref; and using it everywhere as the argument type? It seems it also solves https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n.... https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.... compiled/StringMap.h:37: first.reset(key); Why is this trick required and `:first(key)` cannot be used? I have actually tried it with the current key_type and the compiler hangs up.
LGTM
https://codereview.adblockplus.org/29572731/diff/29581584/compiled/Map.h File compiled/Map.h (right): https://codereview.adblockplus.org/29572731/diff/29581584/compiled/Map.h#newc... compiled/Map.h:263: u"The keys used in map.find() and assign() should be identical"_str); For these two assert, since you call operator""_str() you need to include "String.h". Even if it is only in debug.
https://codereview.adblockplus.org/29572731/diff/29581584/compiled/Map.h File compiled/Map.h (right): https://codereview.adblockplus.org/29572731/diff/29581584/compiled/Map.h#newc... compiled/Map.h:263: u"The keys used in map.find() and assign() should be identical"_str); On 2017/11/21 14:10:39, hub wrote: > For these two assert, since you call operator""_str() you need to include > "String.h". Even if it is only in debug. Done. https://codereview.adblockplus.org/29572731/diff/29581584/compiled/Map.h#newc... compiled/Map.h:278: }; On 2017/10/17 12:58:06, sergei wrote: > This version is fine, I just think it could be a bit shorter > template<typename Entry> > using Set = Map_internal::HashContainer<Entry>; Done. https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.... compiled/StringMap.h:32: DependentString first; On 2017/10/17 12:58:06, sergei wrote: > Actually, the type of `first` should be key_type, thus key_type should be > DependentString. What do you think about having > typedef const String& key_type_cref; > and using it everywhere as the argument type? It seems it also solves > https://codereview.adblockplus.org/29572731/diff/29572868/compiled/IntMap.h#n.... Done. https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.... compiled/StringMap.h:37: first.reset(key); On 2017/10/17 12:58:06, sergei wrote: > Why is this trick required and `:first(key)` cannot be used? > I have actually tried it with the current key_type and the compiler hangs up. That's because DepedentString() constructor will create a valid empty string if constructed from an invalid DependentString() instance. Having DependentString() constructor copy flags isn't entirely trivial however.
LGTM https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29581584/compiled/StringMap.... compiled/StringMap.h:37: first.reset(key); On 2017/12/04 13:44:15, Wladimir Palant wrote: > On 2017/10/17 12:58:06, sergei wrote: > > Why is this trick required and `:first(key)` cannot be used? > > I have actually tried it with the current key_type and the compiler hangs up. > > That's because DepedentString() constructor will create a valid empty string if > constructed from an invalid DependentString() instance. Having DependentString() > constructor copy flags isn't entirely trivial however. I have added a ticket for that because it seems a serious issue which has been already observed in several places. https://issues.adblockplus.org/ticket/6138. https://codereview.adblockplus.org/29572731/diff/29629917/compiled/IntMap.h File compiled/IntMap.h (right): https://codereview.adblockplus.org/29572731/diff/29629917/compiled/IntMap.h#n... compiled/IntMap.h:78: Value second; Merely for the consistency the type could be value_type. https://codereview.adblockplus.org/29572731/diff/29629917/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29629917/compiled/StringMap.... compiled/StringMap.h:35: StringSetEntry(key_type_cref key = DependentString()) here and below the default value for `key` could be key_type() or even some INVALID_KEY, where INVALID_KEY is static const key_type INVALID_KEY = DependentString();
https://codereview.adblockplus.org/29572731/diff/29629917/compiled/StringMap.h File compiled/StringMap.h (right): https://codereview.adblockplus.org/29572731/diff/29629917/compiled/StringMap.... compiled/StringMap.h:35: StringSetEntry(key_type_cref key = DependentString()) On 2017/12/04 14:26:09, sergei wrote: > here and below the default value for `key` could be key_type() or even some > INVALID_KEY, where INVALID_KEY is > static const key_type INVALID_KEY = DependentString(); It's non-trivial to initialize that static member, so I went with key_type() instead.
LGTM
It's already landed, could you please close it. |