A distributed bioinformatics computing system for analysis of DNA sequences

This paper provides an effective design and implementation of a distributed bioinformatics computing system for the analysis of DNA sequences. This system could be used for disease detection, criminal forensic analysis, and protein analysis. Different types of distributed algorithms for the search and identification for a triplet repeat pattern in a given DNA sequence are developed. The search algorithm was developed to compute the number of occurrences of a given pattern in a given gene sequence. A distributed subsequence identification algorithm was designed and implemented to detect repeating patterns. Sequential and distributed implementations of these algorithms were executed with different triplet repeat search patterns and genetic sequences. DNA sequences of different lengths were tested on all these algorithms. These sequences varied in size from very small to very large. The performance of distributed algorithm is compared with the sequential approach.

[1]  Uzi Vishkin,et al.  Optimal Parallel Pattern Matching in Strings , 2017, Inf. Control..

[2]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[3]  Hong Yan,et al.  Selection of statistical features based on mutual information for classification of human coding and non-coding DNA sequences , 2004, ICPR 2004.

[4]  J. Hodgson Gene sequencing's industrial revolution , 2000 .

[5]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[6]  Chintalapati Janaki,et al.  Accelerating comparative genomics using parallel computing , 2003, Silico Biol..

[7]  Volker Strumpen Coupling hundreds of workstations for parallel molecular sequence analysis , 1995, Softw. Pract. Exp..

[8]  Mihai Pop,et al.  Genome Sequence Assembly: Algorithms and Issues , 2002, Computer.

[9]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[10]  Chun-Hsi Huang,et al.  Parallel pattern identification in biological sequences on clusters , 2003, Proceedings. IEEE International Conference on Cluster Computing.

[11]  Hong Yan,et al.  Selection of statistical features based on mutual information for classification of human coding and non-coding DNA sequences , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Sanguthevar Rajasekaran,et al.  Parallel pattern identification in biological sequences on clusters , 2003 .

[13]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .