Parallel algorithm research on several important open problems in bioinformatics

High performance computing has opened the door to using bioinformatics and systems biology to explore complex relationships among data, and created the opportunity to tackle very large and involved simulations of biological systems. Many supercomputing centers have jumped on the bandwagon because the opportunities for significant impact in this field is infinite. Development of new algorithms, especially parallel algorithms and software to mine new biological information and to assess different relationships among the members of a large biological data set, is becoming very important. This article presents our work on the design and development of parallel algorithms and software to solve some important open problems arising from bioinformatics, such as structure alignment of RNA sequences, finding new genes, alternative splicing, gene expression clustering and so on. In order to make these parallel software available to a wide audience, the grid computing service interfaces to these software have been deployed in China National Grid (CNGrid). Finally, conclusions and some future research directions are presented.

[1]  Riccardo Dondi,et al.  A library of efficient bioinformatics algorithms. , 2003, Applied bioinformatics.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[4]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[5]  B. Nadal-Ginard,et al.  Alternative splicing: a ubiquitous mechanism for the generation of multiple protein isoforms from single genes. , 1987, Annual review of biochemistry.

[6]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[7]  Gregory M. Nielson,et al.  Scattered Data Interpolation and Applications: A Tutorial and Survey , 1991 .

[8]  Albert Y. Zomaya Parallel Computing for Bioinformatics and Computational Biology , 2005 .

[9]  Carl R. Woese,et al.  4 Probing RNA Structure, Function, and History by Comparative Analysis , 1993 .

[10]  Michael E. Mortenson,et al.  Geometric Modeling , 2008, Encyclopedia of GIS.

[11]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[12]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[13]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[14]  G. Storz An Expanding Universe of Noncoding RNAs , 2002, Science.

[15]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[16]  Michael Q. Zhang,et al.  Current Topics in Computational Molecular Biology , 2002 .

[17]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[18]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[19]  B. Chabot Directing alternative splicing: cast and scenarios. , 1996, Trends in genetics : TIG.

[20]  David A. Bader Computational biology and high-performance computing , 2004, CACM.

[21]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[22]  N. Pace A molecular view of microbial diversity and the biosphere. , 1997, Science.

[23]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[24]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[25]  R. Gentleman Current Topics in Computational Molecular Biology , 2004 .

[26]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[27]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[28]  David Abramson,et al.  Deploying Scientific Applications to the PRAGMA Grid Testbed: Strategies and Lessons , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).