Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 29794564: [proof-of-concept] Compact trie (Closed)

Created:
May 30, 2018, 11:12 a.m. by Manish Jethani
Modified:
June 4, 2018, 1:38 p.m.
Reviewers:
Visibility:
Public.

Description

Update: See https://codereview.adblockplus.org/29798570/ The compact trie data structure fits large amounts of filter text in less memory. Whereas in the standard trie the strings "foobar", "food", and "foist" would result in 10 nodes ("f", "o", "o", "b", "a", "r", "d", "i", "s", and "t"), in the compact trie it results in only 5 nodes ("fo", "o", "bar", "d", "ist"). Insertion and lookup is expected to be significantly slower (not tested), but this can be improved by changing the list of child nodes from a linked list to a hash table. To add a string to the CompactTrie object: let trie = new CompactTrie(); let node = trie.set(new Buffer(filterText), lineNumber); // UTF-8 To look up the value associated with a string: let lineNumber = true.get(filterText); To retrieve the original string: let filterText = Buffer.from(node.bytes(), "ascii").toString(); How to test it: emcc -s TOTAL_MEMORY=16MB --bind -O3 -o CompactTrie.js CompactTrie.cpp node --expose-gc main.js < easylist.txt

Patch Set 1 #

Total comments: 5
Unified diffs Side-by-side diffs Delta from patch set Stats (+180 lines, -0 lines) Patch
A CompactTrie.cpp View 1 chunk +144 lines, -0 lines 5 comments Download
A main.js View 1 chunk +36 lines, -0 lines 0 comments Download

Messages

Total messages: 3
Manish Jethani
May 30, 2018, 11:12 a.m. (2018-05-30 11:12:35 UTC) #1
Manish Jethani
https://codereview.adblockplus.org/29794564/diff/29794565/CompactTrie.cpp File CompactTrie.cpp (right): https://codereview.adblockplus.org/29794564/diff/29794565/CompactTrie.cpp#newcode55 CompactTrie.cpp:55: size_t n = std::max(child->key.length(), key_.length()); This should be std::min ...
May 30, 2018, 11:49 a.m. (2018-05-30 11:49:27 UTC) #2
Manish Jethani
May 30, 2018, 2:47 p.m. (2018-05-30 14:47:14 UTC) #3
https://codereview.adblockplus.org/29794564/diff/29794565/CompactTrie.cpp
File CompactTrie.cpp (right):

https://codereview.adblockplus.org/29794564/diff/29794565/CompactTrie.cpp#new...
CompactTrie.cpp:56: size_t i = 0;
This could start at 1 instead of 0.

https://codereview.adblockplus.org/29794564/diff/29794565/CompactTrie.cpp#new...
CompactTrie.cpp:61: CompactTrieNode* grandchild = new CompactTrieNode();
Here we're basically splitting a child node into child and grandchild as we
realize that we have run into a substring of the node's key. "foobar" must be
split into "foo" (child) and "bar" (grandchild) when we come across "food"; the
end result is "foo", "bar", and "d". When "foist" is added, "foo" is again split
up into "fo" and "o" so "ist" can be introduced as a child of "fo".

https://codereview.adblockplus.org/29794564/diff/29794565/CompactTrie.cpp#new...
CompactTrie.cpp:109: size_t i = 0;
This could start at 1 instead of 0.

Powered by Google App Engine
This is Rietveld