Estimation of cell lineage trees by maximum-likelihood phylogenetics

CRISPR technology has enabled large-scale cell lineage tracing for complex multicellular organisms by mutating synthetic genomic barcodes during organismal development. However, these sophisticated biological tools currently use ad-hoc and outmoded computational methods to reconstruct the cell lineage tree from the mutated barcodes. Because these methods are agnostic to the biological mechanism, they are unable to take full advantage of the data’s structure. We propose a statistical model for the mutation process and develop a procedure to estimate the tree topology, branch lengths, and mutation parameters by iteratively applying penalized maximum likelihood estimation. In contrast to existing techniques, our method estimates time along each branch, rather than number of mutation events, thus providing a detailed account of tissue-type differentiation. Via simulations, we demonstrate that our method is substantially more accurate than existing approaches. Our reconstructed trees also better recapitulate known aspects of zebrafish development and reproduce similar results across fish replicates.

[1]  Vu C. Dinh,et al.  Nonbifurcating Phylogenetic Tree Inference via the Adaptive LASSO , 2018, Journal of the American Statistical Association.

[2]  Emmanuel Paradis,et al.  ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R , 2018, Bioinform..

[3]  Hélène Morlon,et al.  A Penalized Likelihood Framework for High‐Dimensional Phylogenetic Comparative Methods and an Application to New‐World Monkeys Brain Evolution , 2018, Systematic biology.

[4]  George M. Church,et al.  Developmental barcoding of whole mouse via homing CRISPR , 2018, Science.

[5]  Maximilian J Telford,et al.  Is it possible to reconstruct an accurate cell lineage using CRISPR recorders? , 2018, bioRxiv.

[6]  J. Junker,et al.  Simultaneous lineage tracing and cell-type identification using CRISPR/Cas9-induced genetic scars , 2018, Nature Biotechnology.

[7]  James A. Gagnon,et al.  Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain , 2018, Nature Biotechnology.

[8]  Vu C. Dinh,et al.  Consistency and convergence rate of phylogenetic inference via regularization. , 2016, Annals of statistics.

[9]  D. Adams,et al.  Multivariate Phylogenetic Comparative Methods: Evaluations, Comparisons, and Recommendations , 2018, Systematic biology.

[10]  C. Walsh,et al.  Building a lineage from single cells: genetic techniques for cell lineage tracking , 2017, Nature Reviews Genetics.

[11]  S. Quake,et al.  Quantitative Analysis of Synthetic Cell Lineage Tracing Using Nuclease Barcoding. , 2017, ACS synthetic biology.

[12]  Marc Robinson-Rechavi,et al.  State aggregation for fast likelihood computations in molecular evolution , 2016, Bioinform..

[13]  George M. Church,et al.  Rapidly evolving homing CRISPR barcodes , 2016, Nature Methods.

[14]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[15]  Eric W Goolsby,et al.  Likelihood-Based Parameter Estimation for High-Dimensional Phylogenetic Comparative Models: Overcoming the Limitations of "Distance-Based" Methods. , 2016, Systematic biology.

[16]  James A. Gagnon,et al.  Whole-organism lineage tracing by combinatorial and cumulative genome editing , 2016, Science.

[17]  Alexei J Drummond,et al.  The space of ultrametric phylogenetic trees. , 2014, Journal of theoretical biology.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Ziheng Yang,et al.  Molecular Evolution: A Statistical Approach , 2014 .

[20]  Heinz Koeppl,et al.  Markov chain aggregation and its applications to combinatorial reaction networks , 2014, Journal of mathematical biology.

[21]  Manolis Kellis,et al.  TreeFix: Statistically Informed Gene Tree Error Correction Using Species Trees , 2012, Systematic biology.

[22]  Peter J. A. Cock,et al.  Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython , 2012, BMC Bioinformatics.

[23]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[24]  Michael Defoin-Platel,et al.  Clock-constrained tree proposal operators in Bayesian phylogenetic inference , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[25]  Michael J Sanderson,et al.  Penalized likelihood phylogenetic inference: bridging the parsimony-likelihood gap. , 2008, Systematic biology.

[26]  L. Solnica-Krezel Conserved Patterns of Cell Movements during Vertebrate Gastrulation , 2005, Current Biology.

[27]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[28]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[29]  D. Meyer,et al.  Organization of cardiac chamber progenitors in the zebrafish blastula , 2004, Development.

[30]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[31]  N. Ozernyuk,et al.  Development of the Gill System in Early Ontogenesis of the Zebrafish and Ninespine Stickleback , 2002, Russian Journal of Developmental Biology.

[32]  M. Sanderson Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. , 2002, Molecular biology and evolution.

[33]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[34]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[35]  Sally A. Moody,et al.  Cell lineage and fate determination , 1999 .

[36]  J. Hillston Compositional Markovian Modelling Using a Process Algebra , 1995 .

[37]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[38]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[39]  R. Sokal,et al.  A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY , 1965 .