MBEToolbox: a Matlab toolbox for sequence data analysis in molecular biology and evolution

BackgroundMATLAB is a high-performance language for technical computing, integrating computation, visualization, and programming in an easy-to-use environment. It has been widely used in many areas, such as mathematics and computation, algorithm development, data acquisition, modeling, simulation, and scientific and engineering graphics. However, few functions are freely available in MATLAB to perform the sequence data analyses specifically required for molecular biology and evolution.ResultsWe have developed a MATLAB toolbox, called MBEToolbox, aimed at filling this gap by offering efficient implementations of the most needed functions in molecular biology and evolution. It can be used to manipulate aligned sequences, calculate evolutionary distances, estimate synonymous and nonsynonymous substitution rates, and infer phylogenetic trees. Moreover, it provides an extensible, functional framework for users with more specialized requirements to explore and analyze aligned nucleotide or protein sequences from an evolutionary perspective. The full functions in the toolbox are accessible through the command-line for seasoned MATLAB users. A graphical user interface, that may be especially useful for non-specialist end users, is also provided.ConclusionMBEToolbox is a useful tool that can aid in the exploration, interpretation and visualization of data in molecular biology and evolution. The software is publicly available at http://web.hku.hk/~jamescai/mbetoolbox/ and http://bioinformatics.org/project/?group_id=454.

[1]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[3]  Joshua B. Plotkin,et al.  Detecting selection using a single genome sequence of M. tuberculosis and P. falciparum , 2004, Nature.

[4]  Wen-Hsiung Li Unbiased estimation of the rates of synonymous and nonsynonymous substitution , 2006, Journal of Molecular Evolution.

[5]  M. Nei,et al.  Molecular Evolution and Phylogenetics , 2000 .

[6]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[7]  C. Luo,et al.  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. , 1985, Molecular biology and evolution.

[8]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[9]  Andrey Rzhetsky,et al.  Markov Chain Monte Carlo Computation of Confidence Intervals for Substitution-Rate Variation in Proteins , 2000, Pacific Symposium on Biocomputing.

[10]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[11]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[12]  N. Bianchi,et al.  Evolution of the Zfx and Zfy genes: rates and interdependence between the genes. , 1993, Molecular biology and evolution.

[13]  Y. Ina,et al.  ODEN: a program package for molecular evolutionary analysis and database search of DNA and amino acid sequences , 1994, Comput. Appl. Biosci..

[14]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[15]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[16]  S. Jeffery Evolution of Protein Molecules , 1979 .

[17]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[18]  M. Salemi,et al.  The phylogenetic handbook : a practical approach to DNA and protein phylogeny , 2003 .

[19]  R Zhang,et al.  A Novel Method to Calculate the G+C Content of Genomic DNA Sequences , 2001, Journal of biomolecular structure & dynamics.

[20]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[21]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[22]  T Gojobori,et al.  Large-scale search for genes on which positive selection may operate. , 1996, Molecular biology and evolution.

[23]  K. Crandall,et al.  Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. , 1999, Molecular biology and evolution.

[24]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[25]  Bruce T Lahn,et al.  Adaptive evolution of MRG, a neuron-specific gene family implicated in nociception. , 2003, Genome research.

[26]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[27]  J. Lake,et al.  Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[29]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[30]  H. Akashi,et al.  Within- and between-species DNA sequence variation and the 'footprint' of natural selection. , 1999, Gene.

[31]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[32]  A. Clark,et al.  Excess nonsynonymous substitution of shared polymorphic sites among self-incompatibility alleles of Solanaceae. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[33]  P. Sharp,et al.  In search of molecular darwinism , 1997, Nature.

[34]  David Venet,et al.  MatArray: a Matlab toolbox for microarray data , 2003, Bioinform..

[35]  Ziheng Yang,et al.  Phylogenetic Analysis by Maximum Likelihood (PAML) , 2002 .