论文信息 - A Storage-Efficient Suffix Tree Construction Algorithm for Human Genome Sequences

A Storage-Efficient Suffix Tree Construction Algorithm for Human Genome Sequences

The suffix tree is one of most widely adopted indexes in the application of genome sequence alignment. Although it supports very fast alignment, it has a couple of shortcomings, such as a very long construction time and a very large volume size. Loh et al. [7] proposed a suffix tree construction algorithm with dramatically improved performance; however, the size still remains as a challenging problem. We propose an algorithm by extending the one by Loh et al. to reduce the suffix tree size. As a result of our experiments, our algorithm constructed a suffix tree of approximately 60% of the size within almost the same time period.

Woong-Kee Loh | Heejune Ahn

[1] Jignesh M. Patel,et al. Practical methods for constructing suffix trees , 2005, The VLDB Journal.

[2] Siu-Ming Yiu,et al. SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[3] Richard Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[4] Yang-Sae Moon,et al. A Fast Divide-and-Conquer Algorithm for Indexing Human Genome Sequences , 2010, IEICE Trans. Inf. Syst..

[5] Mohammed J. Zaki,et al. Genome-scale disk-based suffix tree indexing , 2007, SIGMOD '07.

[6] Alex Thomo,et al. A new method for indexing genomes using on-disk suffix trees , 2008, CIKM '08.

[7] Malcolm P. Atkinson,et al. Database indexing for large DNA and protein sequence collections , 2002, The VLDB Journal.

[8] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[9] Giovanni Manzini,et al. Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.