A Novel Distance Metric for Aligning Multiple Sequences Using DNA Hybridization Process

This paper elucidates a new approach for aligning mult iple sequences using DNA operations. A new distance measure using DNA hybridization melt ing temperature that gives appro ximate solutions for the mu ltip le sequence alignment (MSA ) problem is proposed. This paper provides proof for the proposed distance measure using the distance function properties. With this distance metric, a d istance measure is constructed that generates a guide tree for the align ment. Prov iding an accurate solution in less computational t ime is considered to be a challenging task for the MSA problem. Developing an algorith m for the MSA problem is essentially a trade-off between finding an accurate solution and that can be completed in less computational time. In order to reduce the time co mplexity, the Bio- inspired technique called the DNA co mputing is applied in calculat ing the distance between the sequences. The main application of this mu ltiple sequence alignment (MSA) is to identify the sub-sequences for the functional study of the whole genome sequences. The detailed theoretical study of this approach is explained in this paper.

[1]  Xiaojun Wu,et al.  Multiple Sequence Alignment with Hidden Markov Models Learned by Random Drift Particle Swarm Optimization , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[3]  R. Britten,et al.  Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Yi Pan,et al.  A Knowledge-Based Multiple-Sequence Alignment Algorithm , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Martyn Amos,et al.  Theoretical and Experimental DNA Computation , 1999, Bull. EATCS.

[6]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[7]  Arabi Keshk,et al.  Enhanced Dynamic Algorithm of Genome Sequence Alignments , 2014 .

[8]  Xu Li,et al.  Efficient Parallel Design for Edit distance algorithm in DNA Sequence Alignment , 2011 .

[9]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[10]  Ernest Feytmans,et al.  MATCH-BOX: a fundamentally new algorithm for the simultaneous alignment of several protein sequences , 1992, Comput. Appl. Biosci..

[11]  Sean R. Eddy,et al.  A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation , 2008, PLoS Comput. Biol..

[12]  Michael Arock,et al.  A parallel GWO technique for aligning multiple molecular sequences , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[13]  Sandeep Hosangadi Distance Measures for Sequences , 2012, ArXiv.

[14]  Norbert Dojer,et al.  MSARC: Multiple Sequence Alignment by Residue Clustering , 2013, WABI.

[15]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[16]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[17]  Ulrich Bodenhofer,et al.  msa: an R package for multiple sequence alignment , 2015, Bioinform..

[18]  Reda Alhajj,et al.  Multiple sequence alignment with affine gap by using multi-objective genetic algorithm , 2014, Comput. Methods Programs Biomed..

[19]  Ruhul A. Sarker,et al.  Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment , 2011, BMC Bioinformatics.

[20]  Norbert Dojer,et al.  MSARC: Multiple sequence alignment by residue clustering , 2013, Algorithms for Molecular Biology.

[21]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[22]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[23]  Dennis Shasha,et al.  Introduction to Data Mining in Bioinformatics , 2005, Data Mining in Bioinformatics.