论文信息 - Assessment of the parallelization approach of d2_cluster for high‐performance sequence clustering

Assessment of the parallelization approach of d2_cluster for high‐performance sequence clustering

The exponential increase in expressed sequence tag (EST) sequence data amplifies the computational cost of clustering sequences such that new algorithms are required to analyze data at a greater rate. We have parallelized d2_cluster on a SGI Origin 2000 multiprocessor and observed a speedup of approximately 100× on 126 processors when processing a 15,876 EST dataset. The parallelized d2_cluster code is obtainable from the SANBI website (http://www.sanbi.ac.za/CODES). © 2002 Wiley Periodicals, Inc. J Comput Chem 23: 755–757, 2002

John E. Carpenter | Alan Christoffels | Winston Hide | Yael Weinbach

[1] D. Davison,et al. d2_cluster: a validated method for clustering EST and full-length cDNAsequences. , 1999, Genome research.

[2] T. Richmond,et al. Chasing the dream: plant EST microarrays. , 2000, Current opinion in plant biology.

[3] Winston Hide,et al. Biological Evaluation of d2, an Algorithm for High-Performance Sequence Comparison , 1994, J. Comput. Biol..

[4] Winston A Hide,et al. A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. , 1999, Genome research.

[5] Robert Miller,et al. STACK: Sequence Tag Alignment and Consensus Knowledgebase , 2001, Nucleic Acids Res..