Prefix-Shuffled Geometric Suffix Tree

Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate index data structures for such 3-D structures are highly desired for research on proteins. The geometric suffix tree is a very sophisticated index structure that enables fast and accurate search on protein 3-D structures. By using it, we can search from 3-D structure databases for all the substructures whose RMSDs (root mean square deviations) to a given query 3-D structure are not larger than a given bound. In this paper, we propose a new data structure based on the geometric suffix tree whose query performance is much better than the original geometric suffix tree. We call the modified data structure the prefix-shuffled geometric suffix tree (or PSGST for short). According to our experiments, the PSGST outperforms the geometric suffix tree in most cases. The PSGST shows its best performance when the database does not have many substructures similar to the query. The query is sometimes 100 times faster than the original geometric suffix trees in such cases.

[1]  Tetsuo Shibuya Geometric Suffix Tree: A New Index Structure for Protein 3-D Structures , 2006, CPM.

[2]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[3]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[4]  Micha Sharir,et al.  Identification of Partially Obscured Objects in Two and Three Dimensions by Matching Noisy Characteristic Curves , 1987 .

[5]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[9]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[10]  Robert B. Fisher,et al.  Estimating 3-D rigid body transformations: a comparison of four major algorithms , 1997, Machine Vision and Applications.

[11]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[12]  Takuji Nishimura,et al.  A Nonempirical Test on the Weight of Pseudorandom Number Generators , 2002 .

[13]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[14]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .