Trie Methods for Representing Text

We propose a new trie organization for large text documents requiring secondary storage. Index size is critical in all trie representations of text, and our organization is smaller than all known methods. Access time is as good as the best known method. Tries can be constructed in good time. For an index of 100 million entries, our experiments show size factors of less than 3, as compared with 3.4 for the best previous method. Our measurements show expected access costs of 0.1 sec., and construction times of 18 to 55 hours, depending on the text characteristics.

[1]  Rene De La Briandais File searching using variable length keys , 1959, IRE-AIEE-ACM Computer Conference.

[2]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[3]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[4]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[5]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[6]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .