Trie-Based Data Structures for Sequence Assembly

We investigate the application of trie-based data structures, suffix trees and suffix arrays in the problem of overlap detection in fragment assembly. Both data structures are theoretically and experimentally analyzed on speed and space. By using heuristics, we can greatly reduce the calls to the time-consuming dynamic programming, and have improved the speed of overlap detection up to 1,000 times with high accuracy in our collaborative DNA sequencing with Brookhaven National Laboratory. We also studied the problem of approximating maximum space savings in tries structures for unification factoring in logic programming, which is proved to be hard.

[1]  T. Swift,et al.  Uni cation Factoring for E cient Execution of Logic Programs , 1995 .

[2]  Carsten Lund,et al.  The Approximation of Maximum Subgraph Problems , 1993, ICALP.

[3]  David R. Clark,et al.  Efficient suffix trees on secondary storage , 1996, SODA '96.

[4]  Mihir Bellare,et al.  Free bits, PCPs and non-approximability-towards tight results , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[5]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[6]  C. Burks,et al.  Artificially generated data sets for testing DNA sequence assembly algorithms. , 1993, Genomics.

[7]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Chih-Long Lin Optimizing TRIEs for Ordered Pattern Matching is Pi P 2 -Complete. , 1995 .

[11]  Hans Ulrich Simon,et al.  On Approximate Solutions for Combinatorial Optimization Problems , 1990, SIAM J. Discret. Math..

[12]  Steven Skiena,et al.  Unification factoring for efficient execution of logic programs , 1995, POPL '95.

[13]  Steven Skiena,et al.  Principles and practice of unification factoring , 1996, TOPL.

[14]  Eugene W. Myers,et al.  Toward Simplifying and Accurately Formulating Fragment Assembly , 1995, J. Comput. Biol..