Average sizes of suffix trees and DAWGs
暂无分享,去创建一个
Abstract Suffix trees, directed acyclic word graphs (DAWGs) and related data structures are useful for text retrieval and analysis. Linear upper and lower bounds on their sizes are known. Constructing these data structures for random strings, one observes that the size does not increase smoothly, but oscillates between these bounds. We use Mellin transforms to obtain size estimates as integrals of meromorphic functions. Poles on the real axis lead to exact formulae for the average sizes, while poles with nonzero imaginary part lead to very good estimates of the oscillations.
[1] David Haussler,et al. Building a complete inverted file for a set of text files in linear time , 1984, STOC '84.
[2] Edward M. McCreight,et al. A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.
[3] David Haussler,et al. The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..
[4] Mila E. Majster. Efficient On-Line Construction and Correction of Position Trees , 1980 .