markophylo: Markov chain analysis on phylogenetic trees

SUMMARY Continuous-time Markov chain models with finite state space are routinely used for analysis of discrete character data on phylogenetic trees. Examples of such discrete character data include restriction sites, gene family presence/absence, intron presence/absence and gene family size data. While models with constrained substitution rate matrices have been used to good effect, more biologically realistic models have been increasingly implemented in the recent literature combining, e.g., site rate variation, site partitioning, branch-specific rates, allowing for non-stationary prior root probabilities, correcting for sampling bias, etc. to name a few. Here, a flexible and fast R package is introduced that infers evolutionary rates of discrete characters on a tree within a probabilistic framework. The package, markophylo, fits maximum-likelihood models using Markov chains on phylogenetic trees. The package is efficient, with the workhorse functions written in C++ and the interface in user-friendly R. AVAILABILITY AND IMPLEMENTATION markophylo is available as a platform-independent R package from the Comprehensive R Archive Network at https://cran.r-project.org/web/packages/markophylo/. A vignette with numerous examples is also provided with the R package. CONTACT udang@mcmaster.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Tal Pupko,et al.  GLOOME: gain loss mapping engine , 2010, Bioinform..

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  M. Pagel,et al.  Bayesian estimation of ancestral character states on phylogenies. , 2004, Systematic biology.

[6]  Conrad Sanderson,et al.  RcppArmadillo: Accelerating R with high-performance C++ linear algebra , 2014, Comput. Stat. Data Anal..

[7]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[8]  Joseph Felsenstein,et al.  PHYLOGENIES FROM RESTRICTION SITES: A MAXIMUM‐LIKELIHOOD APPROACH , 1992, Evolution; international journal of organic evolution.

[9]  Tane Kim,et al.  DiscML: an R package for estimating evolutionary rates of discrete characters using maximum likelihood , 2014, BMC Bioinformatics.

[10]  Ziheng Yang,et al.  Molecular Evolution: A Statistical Approach , 2014 .

[11]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[12]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[13]  B. O’Meara Evolutionary Inferences from Phylogenies: A Review of Methods , 2012 .

[14]  Tal Pupko,et al.  Inference and Characterization of Horizontally Transferred Gene Families Using Stochastic Mapping , 2009, Molecular biology and evolution.

[15]  G. B. Golding,et al.  The fate of laterally transferred genes: life in the fast lane to adaptation or death. , 2006, Genome research.

[16]  Matthew W. Hahn,et al.  Estimating the tempo and mode of gene family evolution from comparative genomic data. , 2005, Genome research.

[17]  Pablo Librado,et al.  BadiRate: estimating family turnover rates by likelihood-based methods , 2012, Bioinform..

[18]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[19]  Mira V. Han,et al.  Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. , 2013, Molecular biology and evolution.

[20]  Adi Stern,et al.  A likelihood framework to analyse phyletic patterns , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..