2 Su x Trees A su x tree is a compact trie built on the jTj+ 1 su xes of T Module 1: TRIE Compact representation of frequent itemsets in lexicographic order. In this sense, a trie can be seen as a compact representation of a set of strings. Compressed Data Structures The Aim Represent the data structure in space as small as possible, without a loss in its functionality. compressed trie, ternary search tree, In computer science, a trie, or prefix tree, is an ordered tree data structure that to compress the trie representation by merging the common branches.
Compact representation of a compressed trie for an array of strings: –Nodes store ranges of indices instead of substrings –Uses O(s) space, where s is the number of strings in the array –Serves as an auxiliary index structure Tries 10 Compact Representation Compact Representation Compact representation of a compressed trie for an array of strings:! Stores at the nodes ranges of indices instead of substrings! Uses O(s) space, where s is the number of strings in the array! Serves as an auxiliary index structure s e e b e a r s e l l s t o c k b u l l b u y b i d h e b e l l s t o p 01234 S = a r Suffix Trie (1) The suffix trie of a string 3:58 AM Tries 9 X is the compressed trie of all the suffixes of X e nimize nimize ze ze i mi mize nimize ze m i n i z e m i 01234567 3:58 AM Tries 10 Suffix Trie (2) Compact representation of the suffix trie for a string X of size n from an alphabet of size d! Uses O (n) space! I’m always happy to review pull requests though adding a ‘compressed’ representation to the library is rather involved change so this might benefit from discussion of the design before code is written.In applications such as information retrieval and database systems, , when designing compact tries, the main focus is put on properties such as its size Compressed Trie A compressed trie has internal nodes of degree at least two It is obtained from standard trie by compressing chains of “redundant” nodes e b ar ll s u ll y ell to ck p id a e b r l l s u l l y e t l l o c k p i d Tries and Huffman Codes 9 Compact Representation Compact representation of a compressed trie for an array ofstrings: Compact Representation Compact representation of a compressed trie for an array of strings: Nodes store ranges of indices instead of substrings Uses O(s) space, where s is the number of strings in the array Serves as an auxiliary index structure 2:14 PM Tries 10 Compact Representation Each node stores the triple (i,l,k) where i is the Compact Representation It can be used as an index (or pointer) to the data. A set of bit vectors (in our case representing the annotations) is encoded as paths from the root to the leaves of the tree, storing prefixes shared by all children of a node only once. It has n leaves, each of which corresponds to a suffix of T. Compact Tries Compact representation of a compressed trie Approach. You may have noticed that in the previously exposed trie, some nodes just form a linked list that we could compress. As you can see from the above picture, it is a tree-like structure where each node represents a single character of a given string. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern. The function d() decodes the byte stream, and builds a trie using nested dictionaries.