Tools and Methods in the Analysis of Simple Sequences

The comparison and analysis of large-scale nucleotide and protein sequences have always remained to be a challenging task for the molecular biologists. However, the development of new statistical methods and computational programs has empowered the scientific community to analyze and interpret the features, function, structure, and evolution of biological sequencing data without much difficulty. In this context, the current chapter presents with different sequence alignment approaches including pairwise alignment and multiple sequence alignment and phylogenetic tree construction. This chapter provides insight into different bioinformatics tools and algorithms along with some basic examples. It also covers the essential topics of sequence analysis for the ease of readers to understand and implement in their regular work.

[1]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[2]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[3]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[5]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[6]  A. Hald A history of mathematical statistics from 1750 to 1930 , 1998 .

[7]  J. Maizel,et al.  Enhanced graphic matrix analysis of nucleic acid and protein sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Zhao Xu,et al.  Jackknife and Bootstrap Tests of the Composition Vector Trees , 2010, Genom. Proteom. Bioinform..

[9]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[10]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[11]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[12]  G A Churchill,et al.  Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. , 1991, Molecular biology and evolution.

[13]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[14]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[15]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[16]  Maximum parsimony method for phylogenetic prediction. , 2008, CSH protocols.

[17]  M. O. Dayhoff,et al.  Evolution of the Structure of Ferredoxin Based on Living Relics of Primitive Amino Acid Sequences , 1966, Science.

[18]  B. Efron,et al.  Second thoughts on the bootstrap , 2003 .

[19]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[20]  Sudhir Kumar,et al.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. , 2016, Molecular biology and evolution.

[21]  K Lange,et al.  Computational advances in maximum likelihood methods for molecular phylogeny. , 1998, Genome research.

[22]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[23]  A. Tversky,et al.  Additive similarity trees , 1977 .

[24]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[25]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[26]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[27]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[28]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[29]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[30]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[31]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[32]  John P. Huelsenbeck,et al.  Bayesian Analysis of Molecular Evolution Using MrBayes , 2005 .

[33]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[34]  F J Ayala,et al.  Estimation and interpretation of genetic distance in empirical studies. , 1982, Genetical research.

[35]  J. Aldrich R.A. Fisher and the making of maximum likelihood 1912-1922 , 1997 .

[36]  Medha Bhagwat,et al.  Using BLAT to find sequence similarity in closely related genomes. , 2012, Current protocols in bioinformatics.

[37]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[38]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[39]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[40]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[41]  R F Doolittle,et al.  Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. , 1996, Methods in enzymology.

[42]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .

[43]  Olivier Gascuel,et al.  PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference , 2018 .

[44]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[45]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[46]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[47]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[48]  Philippe Ortet,et al.  Where Does the Alignment Score Distribution Shape Come from? , 2010, Evolutionary bioinformatics online.

[49]  D. Penny,et al.  Branch and bound algorithms to determine minimal evolutionary trees , 1982 .

[50]  J. Felsenstein Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. , 1996, Methods in enzymology.

[51]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[52]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[53]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[54]  M. Waterman,et al.  Line geometries for sequence comparisons , 1984 .

[55]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.

[56]  Valery Polyanovsky,et al.  Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences , 2011, Algorithms for Molecular Biology.

[57]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  W R Pearson,et al.  Dynamic programming algorithms for biological sequence comparison. , 1992, Methods in enzymology.

[59]  Jon A Yamato,et al.  Maximum likelihood estimation of population growth rates based on the coalescent. , 1998, Genetics.

[60]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.