Split trees are a new technique for searching sets of keys with highly skewed frequency distributions. A split tree is a binary search tree each node of which contains two key values—a <italic>node</italic> value which is a maximally frequent key in that subtree, and a <italic>split</italic> value which partitions the remaining keys (with respect to their lexical ordering) between the left and right subtrees. A <italic>median</italic> split tree (MST) uses the lexical median of a node's descendents as its split value to force the search tree to be perfectly balanced, achieving both a space efficient representation of the tree and high search speed. Unlike frequency ordered binary search trees, the cost of a successful search of an MST is log <italic>n</italic> bounded and very stable around minimal values. Further, an MST can be built for a given key ordering and set of frequencies in time <italic>n</italic> log <italic>n</italic>, as opposed to <italic>n</italic><supscrpt>2</supscrpt> for an optimum binary search tree. A discussion of the application of MST's to dictionary lookup for English is presented, and the performance obtained is contrasted with that of other techniques.
[1]
Renzo Sprugnoli,et al.
Perfect hashing functions
,
1977,
Commun. ACM.
[2]
E. Crook,et al.
Word Recognition
,
2010
.
[3]
Daniel S. Hirschberg,et al.
An insertion technique for one-sided height-balanced trees
,
1976,
CACM.
[4]
Donald Ervin Knuth,et al.
The Art of Computer Programming
,
1968
.
[5]
Manuel Blum,et al.
Time Bounds for Selection
,
1973,
J. Comput. Syst. Sci..
[6]
Donald E. Knuth.
The art of computer programming: fundamental algorithms
,
1969
.
[7]
Kurt Maly.
Compressed tries
,
1976,
CACM.
[8]
George Kingsley Zipf,et al.
Human behavior and the principle of least effort
,
1949
.
[9]
H. Kucera,et al.
Computational analysis of present-day American English
,
1967
.