Current Trends and Ongoing Progress in the Computational Alignment of Biological Sequences

The computational techniques for nucleic acid and protein sequence comparison reduce the extensive burden of molecular biologists. The sequence alignment is one of the main research areas in bioinformatics, and comparative genomics and proteomics lead us to important discoveries in various fields of bioinformatics. Researchers develop and use different heuristics and evolutionary algorithms for optimal DNA and protein sequence alignment. There are different categories of improved computational sequence matching methods. In this paper, the goal is to cover almost all computational approaches toward sequence alignment. Different aspects and issues related to the optimal alignment of biological sequences will be analyzed. The sequence comparisons through mathematical and computational techniques have manifold benefits and importance in bioinformatics. Researchers recently explore proposing novel computational techniques for simultaneous matching of multiple sequences or multiple sequence alignment (MSA). Pairwise alignment, or the alignment of two sequences, is the basic building block of all alignment methods. The goal is to design optimal and relevant algorithms with less computational complexity and more efficiency.

[1]  F E Cohen,et al.  Pairwise sequence alignment below the twilight zone. , 2001, Journal of molecular biology.

[2]  Nathan S. Watson-Haigh,et al.  SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments , 2012, Nucleic Acids Res..

[3]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[4]  S Henikoff,et al.  Performance evaluation of amino acid substitution matrices , 1993, Proteins.

[5]  David Díaz,et al.  MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures , 2014, PloS one.

[6]  M. Waterman,et al.  RNA secondary structure: a complete mathematical analysis , 1978 .

[7]  Desmond G. Higgins,et al.  Instability in progressive multiple sequence alignment algorithms , 2015, Algorithms for Molecular Biology.

[8]  Plamenka Borovska,et al.  Scaling of parallel multiple sequence alignment on the supercomputer JUQUEEN , 2013, 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS).

[9]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[10]  R. Merkl,et al.  Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments. , 2012, Biochemistry.

[11]  Amarpal Singh,et al.  MEGA biocentric software for sequence and phylogenetic analysis: a review , 2010, Int. J. Bioinform. Res. Appl..

[12]  Ken D. Nguyen On the Edge of Web-Based Multiple Sequence Alignment Services , 2012 .

[13]  M. Ragan,et al.  Inferring phylogenies of evolving sequences without multiple sequence alignment , 2014, Scientific Reports.

[14]  Daniel Svozil,et al.  Multiple 3D RNA Structure Superposition Using Neighbor Joining , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Dan Graur,et al.  Characterization of pairwise and multiple sequence alignment errors. , 2009, Gene.

[16]  Classification of amino acids induced by their associated matrices. , 2005, Biophysical chemistry.

[17]  Alfonso Valencia,et al.  Predicting reliable regions in protein alignments from sequence profiles. , 2003, Journal of molecular biology.

[18]  P. Borovska,et al.  Massively Parallel Multiple Sequence Alignment on the Supercomputer JUQUEEN , 2018 .

[19]  Denis Trystram,et al.  Large scale multiple sequence alignment with simultaneous phylogeny inference , 2006, J. Parallel Distributed Comput..

[20]  Desmond G. Higgins,et al.  OD-seq: outlier detection in multiple sequence alignments , 2015, BMC Bioinformatics.

[21]  Pankaj Agrawal,et al.  Multiple Sequence Alignments with Parallel Computing , 2014 .

[22]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[23]  Benjamin P. Blackburne,et al.  Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty , 2015, Genome biology and evolution.

[24]  Roy D. Sleator,et al.  An Overview of Multiple Sequence Alignments and Cloud Computing in Bioinformatics , 2013 .

[25]  Angelo M. Facchiano,et al.  FASMA: A Service to Format and Analyze Sequences in Multiple Alignments , 2008, Genom. Proteom. Bioinform..

[26]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[27]  Guohua Huang,et al.  Alignment-free comparison of genome sequences by a new numerical characterization. , 2011, Journal of theoretical biology.

[28]  Susana Vinga,et al.  Editorial: Alignment-free methods in computational biology , 2014, Briefings Bioinform..

[29]  Carol Zhou,et al.  CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments , 2015, Source Code for Biology and Medicine.

[30]  Matthieu Muffato,et al.  Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference , 2015, Systematic biology.

[31]  István Miklós,et al.  Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs , 2015, BMC Bioinformatics.

[32]  Zhao-Hui Qi,et al.  New method for global alignment of 2 DNA sequences by the tree data structure , 2009, Journal of Theoretical Biology.

[33]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[34]  Peter F. Stadler,et al.  Product Grammars for Alignment and Folding , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Yadong Wang,et al.  Improving multiple sequence alignment by using better guide trees , 2015, BMC Bioinformatics.

[36]  D. Mount,et al.  Comparison of the PAM and BLOSUM Amino Acid Substitution Matrices. , 2008, CSH protocols.

[37]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[38]  Cory C. Funk,et al.  SNAPR: A Bioinformatics Pipeline for Efficient and Accurate RNA-Seq Alignment and Analysis , 2015, IEEE Life Sciences Letters.

[39]  Stephanie J. Spielman,et al.  Comprehensive, structurally-informed alignment and phylogeny of vertebrate biogenic amine receptors , 2015, PeerJ.

[40]  Zhi Gong,et al.  Performance assessment of protein multiple sequence alignment algorithms based on permutation similarity measurement. , 2010, Biochemical and biophysical research communications.

[41]  D. Mathews,et al.  Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures , 2015, PloS one.

[42]  Albert Y. Zomaya,et al.  Grid Computing for Bioinformatics and Computational Biology , 2007, Wiley series on bioinformatics.

[43]  A. Marzal,et al.  Normalized global alignment for protein sequences. , 2011, Journal of theoretical biology.

[44]  Jens Stoye,et al.  Improving the Divide-and-Conquer Approach to Sum-of-Pairs Multiple Sequence Alignment , 1997 .

[45]  Thomas L. Madden,et al.  The BLAST Sequence Analysis Tool , 2013 .

[46]  Olivier Gascuel,et al.  TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction , 2015, Nucleic Acids Res..

[47]  Carsten Peterson,et al.  Potential for dramatic improvement in sequence alignment against structures of remote homologous proteins by extracting structural information from multiple structure alignment. , 2003, Journal of molecular biology.

[48]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[49]  Vito Porcelli,et al.  Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening. , 2010, Biochimica et biophysica acta.

[50]  Azzedine Boukerche,et al.  Parallel Strategies for Local Biological Sequence Alignment in a Cluster of Workstations , 2005, IPDPS.

[51]  Sven Warris,et al.  Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS , 2015, PloS one.

[52]  Sanne Abeln,et al.  Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks , 2015, PloS one.

[53]  Deepti Shrimankar,et al.  Parallelizing and Analyzing the Behavior of Sequence Alignment Algorithm on a Cluster of Workstations for Large Datasets , 2013 .

[54]  Jun S. Liu,et al.  BALSA: Bayesian algorithm for local sequence alignment. , 2002, Nucleic acids research.

[55]  Erik Wright,et al.  DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment , 2015, BMC Bioinformatics.

[56]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[57]  Paulo F. Flores,et al.  Hardware accelerator architecture for simultaneous short-read DNA sequences alignment with enhanced traceback phase , 2012, Microprocess. Microsystems.

[58]  Salman Qadri,et al.  IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments , 2015, Evolutionary bioinformatics online.

[59]  Cynthia Vinzant Lower bounds for optimal alignments of binary sequences , 2009, Discret. Appl. Math..