Prefix trees: new efficient data structures for matching strings of different lengths

Prefix matching is an essential part of some applications. A well known application of prefix matching is layers 3 and 4 switching in TCP/IP protocols. It is assumed that there are strings of an alphabet /spl Sigma/ which are ordered. The strings may have different lengths and some may be prefixes of others. We introduce a simple scheme for comparing and sorting strings of different lengths first. Then, data is manipulated and the well-known tree structures are tuned to handle the prefix matching queries. A string prefix represents a data space that includes all strings for which it is a prefix. Using this concept, first, we propose a binary prefix tree and extended it to two m-way tree structures, static m-way prefix tree and dynamic m-way prefix tree later. The m-way trees have better search performance at the expense of some memory space. The static m-way prefix is more suitable for an environment with few data transactions and is expected to give better memory usage and create a more compact tree structure. The dynamic m-way prefix tree is a superset of B tree and is suited better for an environment with large transactions. All data structures share a common property: no data string can be at a higher level than its prefix in the data set. The proposed data structures are simple, well defined, easy to implement in hardware or software and efficient in terms of memory usage and search time.

[1]  Butler W. Lampson,et al.  IP lookups using multiway and multicolumn search , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[2]  Raffaele Giancarlo,et al.  Dynamic Dictionary Matching , 1994, J. Comput. Syst. Sci..

[3]  Bernhard Plattner,et al.  Scalable high speed IP routing lookups , 1997, SIGCOMM '97.

[4]  非莱诺·A·阿列瓦 Method and system for encoding pronunciation prefix trees , 1997 .

[5]  Dany Breslauer,et al.  Tight Comparison Bounds for the String Prefix-Matching Problem , 1993, CPM.

[6]  Gaston H. Gonnet,et al.  Handbook Of Algorithms And Data Structures , 1984 .

[7]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[8]  Günter Karjoth,et al.  Routing on longest-matching prefixes , 1996, TNET.

[9]  V. Srinivasan,et al.  Fast address lookups using controlled prefix expansion , 1999, TOCS.

[10]  Roberto Grossi,et al.  The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[11]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[12]  Wendy Hall,et al.  The art of programming , 1987 .

[13]  Nasser Yazdani,et al.  Matching and indexing sequences of different lengths , 1997, CIKM '97.

[14]  Stefan Nilsson,et al.  Implementing a Dynamic Compressed Trie , 1998, WAE.

[15]  Svante Carlsson,et al.  Small forwarding tables for fast routing lookups , 1997, SIGCOMM '97.

[16]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[17]  N. Yazdani,et al.  Fast and scalable schemes for the IP address lookup problem , 2000, ATM 2000. Proceedings of the IEEE Conference on High Performance Switching and Routing (Cat. No.00TH8485).

[18]  Gunnar Karlsson,et al.  Fast address look-up for internet routers , 1998, Broadband Communications.