Genomic data analysis in tree spaces

Recently, an elegant approach in phylogenetics was introduced by Billera-Holmes-Vogtmann that allows a systematic comparison of different evolutionary histories using the metric geometry of tree spaces. In many problem settings one encounters heavily populated phylogenetic trees, where the large number of leaves encumbers visualization and analysis in the relevant evolutionary moduli spaces. To address this issue, we introduce tree dimensionality reduction, a structured approach to reducing large phylogenetic trees to a distribution of smaller trees. We prove a stability theorem ensuring that small perturbations of the large trees are taken to small perturbations of the resulting distributions. We then present a series of four biologically motivated applications to the analysis of genomic data, spanning cancer and infectious disease. The first quantifies how chemotherapy can disrupt the evolution of common leukemias. The second examines a link between geometric information and the histologic grade in relapsed gliomas, where longer relapse branches were specific to high grade glioma. The third concerns genetic stability of xenograft models of cancer, where heterogeneity at the single cell level increased with later mouse passages. The last studies genetic diversity in seasonal influenza A virus. We apply tree dimensionality reduction to 24 years of longitudinally collected H3N2 hemagglutinin sequences, generating distributions of smaller trees spanning between three and five seasons. A negative correlation is observed between the influenza vaccine effectiveness during a season and the variance of the distributions produced using preceding seasons' sequence data. We also show how tree distributions relate to antigenic clusters and choice of influenza vaccine. Our formalism exposes links between viral genomic data and clinical observables such as vaccine selection and efficacy.

[1]  Marleen de Bruijne,et al.  Tree-Space Statistics and Approximations for Large-Scale Analysis of Anatomical Trees , 2013, IPMI.

[2]  Ezra Miller,et al.  Polyhedral computational geometry for averaging metric phylogenetic trees , 2012, Adv. Appl. Math..

[3]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[4]  D. Burago,et al.  A Course in Metric Geometry , 2001 .

[5]  R. Rabadán,et al.  Geographic dependence, surveillance, and origins of the 2009 influenza A (H1N1) virus. , 2009, The New England journal of medicine.

[6]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[7]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[8]  In-Hee Lee,et al.  Clonal evolution of glioblastoma under therapy , 2016, Nature Genetics.

[9]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[10]  S. Holmes,et al.  Bootstrapping Phylogenetic Trees: Theory and Methods , 2003 .

[11]  Tom M. W. Nye,et al.  Principal components analysis in the space of phylogenetic trees , 2011, 1202.5132.

[12]  Bryan T Grenfell,et al.  Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses , 2005, PLoS biology.

[13]  Aiping Wu,et al.  Mapping of H3N2 influenza antigenic evolution in China reveals a strategy for vaccine strain recommendation , 2012, Nature Communications.

[14]  L. Pournin The diameter of associahedra , 2012, 1207.6296.

[15]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[16]  Michael W Deem,et al.  Quantifying influenza vaccine efficacy and antigenic distance. , 2005, Vaccine.

[17]  Stephen Smale,et al.  Finding the Homology of Submanifolds with High Confidence from Random Samples , 2008, Discret. Comput. Geom..

[18]  M. Gromov Metric Structures for Riemannian and Non-Riemannian Spaces , 1999 .

[19]  E. Giné,et al.  Lectures on the central limit theorem for empirical processes , 1986 .

[20]  Sebastian Ehrlichmann,et al.  Metric Spaces Of Non Positive Curvature , 2016 .

[21]  E. Giné,et al.  Bootstrapping General Empirical Measures , 1990 .

[22]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[23]  Graham A. Niblo METRIC SPACES OF NON‐POSITIVE CURVATURE (Grundlehren der Mathematischen Wissenschaften 319) , 2001 .

[24]  Sohrab P. Shah,et al.  Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution , 2014, Nature.

[25]  Ron A M Fouchier,et al.  Influenza vaccine strain selection and recent studies on the global migration of seasonal influenza viruses. , 2008, Vaccine.

[26]  Luca Laurenti,et al.  Tumor evolutionary directed graphs and the history of chronic lymphocytic leukemia , 2014, eLife.

[27]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[28]  R. Tarjan,et al.  Rotation distance, triangulations, and hyperbolic geometry , 1986, STOC '86.

[29]  Raul Rabadan,et al.  Non‐random reassortment in human influenza A viruses , 2008, Influenza and other respiratory viruses.

[30]  Alexandra G. Smith,et al.  Incidence of haematological malignancy by sub-type: a report from the Haematological Malignancy Research Network , 2011, British Journal of Cancer.

[31]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[32]  Huiling Le,et al.  Central limit theorems for Fréchet means in the space of phylogenetic trees , 2013 .

[33]  R Rabadan,et al.  Cluster analysis of the origins of the new influenza A(H1N1) virus. , 2009, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[34]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[35]  Robert J Dempsey,et al.  Science Times , 2022 .

[36]  A. McKenna,et al.  Evolution and Impact of Subclonal Mutations in Chronic Lymphocytic Leukemia , 2012, Cell.

[37]  Andrew J. Blumberg,et al.  Moduli Spaces of Phylogenetic Trees Describing Tumor Evolutionary Patterns , 2014, Brain Informatics and Health.

[38]  R. Rabadán,et al.  Evolution of the Influenza A Virus: Some New Advances , 2008, Evolutionary bioinformatics online.

[39]  Kevin Atteson The performance of the neighbor-joining method of phylogeny reconstruction , 1996, Mathematical Hierarchies and Biology.

[40]  Preliminary assessment of the effectiveness of the 2003-04 inactivated influenza vaccine--Colorado, December 2003. , 2004, MMWR. Morbidity and mortality weekly report.

[41]  Alan Robinson,et al.  The tree representation of ∑n + 1 , 1996 .

[42]  S. Holmes,et al.  STATISTICAL APPROACH TO TESTS INVOLVING PHYLOGENIES , 2004 .

[43]  G. Carlsson,et al.  Topology of viral evolution , 2013, Proceedings of the National Academy of Sciences.

[44]  Robert E. Tarjan,et al.  Short Encodings of Evolving Structures , 1992, SIAM J. Discret. Math..

[45]  Martin R. Bridson,et al.  Geodesics and curvature in metric simplicial complexes , 1991 .

[46]  Satyan L. Devadoss,et al.  Polyhedral Covers of Tree Space , 2014, SIAM J. Discret. Math..

[47]  N. Pierce Origin of Species , 1914, Nature.

[48]  Karl-Theodor Sturm,et al.  Probability Measures on Metric Spaces of Nonpositive Curvature , 2003 .

[49]  Steven J. M. Jones,et al.  Mutational Analysis Reveals the Origin and Therapy-Driven Evolution of Recurrent Glioma , 2014, Science.

[50]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[51]  Jenny Taylor,et al.  Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. , 2012, Blood.

[52]  Viral diversity and clonal evolution from unphased genomic data , 2014, BMC Genomics.

[53]  J. Scott Provan,et al.  A Fast Algorithm for Computing Geodesic Distances in Tree Space , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[54]  Gabor Moussong,et al.  Hyperbolic Coxeter groups , 1988 .

[55]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.