Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing

Abstract Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the ‘dark side’ of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.

[1]  Anushya Muruganujan,et al.  Ancestral Genomes: a resource for reconstructed ancestral genes and genomes across the tree of life , 2018, Nucleic Acids Res..

[2]  S. Hallam,et al.  Single cell genomics of uncultured marine alveolates shows paraphyly of basal dinoflagellates , 2017, The ISME Journal.

[3]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[4]  Fabien Burki,et al.  New Phylogenomic Analysis of the Enigmatic Phylum Telonemia Further Resolves the Eukaryote Tree of Life , 2018, bioRxiv.

[5]  Bjoern E. Langer,et al.  Phenotype loss is associated with widespread divergence of the gene regulatory landscape in evolution , 2018, Nature Communications.

[6]  Inna Dubchak,et al.  The genome portal of the Department of Energy Joint Genome Institute: 2014 updates , 2013, Nucleic Acids Res..

[7]  C. Pál,et al.  The evolutionary dynamics of eukaryotic gene order , 2004, Nature Reviews Genetics.

[8]  Joseph Felsenstein,et al.  Using the quantitative genetic threshold model for inferences between and within species , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[10]  S. Liggett Pharmacogenetic applications of the Human Genome project , 2001, Nature Medicine.

[11]  Xun Xu,et al.  10KP: A phylodiverse genome sequencing plan , 2018, GigaScience.

[12]  Juan Carlos Castilla-Rubio,et al.  Earth BioGenome Project: Sequencing life for the future of life , 2018, Proceedings of the National Academy of Sciences.

[13]  D. Rabosky,et al.  Inferring Diversification Rate Variation From Phylogenies With Fossils , 2018, Systematic Biology.

[14]  L. Nagy Many roads to convergence , 2018, Science.

[15]  Juying Yan,et al.  Transcriptomic atlas of mushroom development reveals conserved genes behind complex multicellularity in fungi , 2019, Proceedings of the National Academy of Sciences.

[16]  M. Chikina,et al.  Hundreds of Genes Experienced Convergent Shifts in Selective Pressure in Marine Mammals. , 2016, Molecular biology and evolution.

[17]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Hiller,et al.  Convergent gene losses illuminate metabolic and physiological changes in herbivores and carnivores , 2019, Proceedings of the National Academy of Sciences.

[19]  Comparative genomics reveals the origin of fungal hyphae and multicellularity , 2019, Nature Communications.

[20]  Albee Y. Ling,et al.  The Paleozoic Origin of Enzymatic Lignin Decomposition Reconstructed from 31 Fungal Genomes , 2012, Science.

[21]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.

[22]  Ehud Shapiro,et al.  eSTGt: a programming and simulation environment for population dynamics , 2016, BMC Bioinformatics.

[23]  Inna Dubchak,et al.  The Genome Portal of the Department of Energy Joint Genome Institute , 2011, Nucleic Acids Res..

[24]  Berend Snel,et al.  Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences , 2019, PLoS Comput. Biol..

[25]  Susan J. Brown,et al.  The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. , 2013, The Journal of heredity.

[26]  Igor B. Rogozin,et al.  Dollo parsimony and the reconstruction of genome evolution , 2006 .

[27]  D. Hibbett,et al.  Fueling the future with fungal genomics , 2011 .

[28]  Miklós Csuös,et al.  Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood , 2010, Bioinform..

[29]  Zhijin Wu,et al.  Phylogenetic analysis of gene expression. , 2013, Integrative and comparative biology.

[30]  M. Eisen,et al.  Gene family innovation, conservation and loss on the animal stem lineage , 2018, eLife.

[31]  M. Csűrös,et al.  Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. , 2010, Bioinformatics.

[32]  J. Townsend,et al.  The ancestral levels of transcription and the evolution of sexual phenotypes in filamentous fungi , 2017, PLoS genetics.

[33]  Charles DeLisi,et al.  Comparative assessment of performance and genome dependence among phylogenetic profiling methods , 2006, BMC Bioinformatics.

[34]  Matteo Pellegrini,et al.  Using phylogenetic profiles to predict functional relationships. , 2012, Methods in molecular biology.

[35]  C. Shelton,et al.  Annotating Genes of Known and Unknown Function by Large-Scale Coexpression Analysis1[W][OA] , 2008, Plant Physiology.

[36]  Jonathan P. Bollback,et al.  SIMMAP: Stochastic character mapping of discrete traits on phylogenies , 2006, BMC Bioinformatics.

[37]  Robert D. Finn,et al.  The European Bioinformatics Institute in 2016: Data growth and integration , 2015, Nucleic Acids Res..

[38]  S. Leys,et al.  The hidden biology of sponges and ctenophores. , 2015, Trends in ecology & evolution.

[39]  M. Huss,et al.  A primer on deep learning in genomics , 2018, Nature Genetics.

[40]  É. Tannier,et al.  The Inference of Gene Trees with Species Trees , 2013, Systematic biology.

[41]  Luís A. Nunes Amaral,et al.  Large-scale investigation of the reasons why potentially important genes are ignored , 2018, PLoS biology.

[42]  G. Parra,et al.  Controlling for Phylogenetic Relatedness and Evolutionary Rates Improves the Discovery of Associations Between Species’ Phenotypic and Genomic Differences , 2016, Molecular biology and evolution.

[43]  L. Revell ANCESTRAL CHARACTER ESTIMATION UNDER THE THRESHOLD MODEL FROM QUANTITATIVE GENETICS , 2014, Evolution; international journal of organic evolution.

[44]  Matthew W. Hahn,et al.  Phylogenomics Reveals Three Sources of Adaptive Variation during a Rapid Radiation , 2016, PLoS biology.

[45]  Richard Levine i5k: The 5,000 Insect Genome Project , 2011 .

[46]  T. Meyer,et al.  Phylogenetic Profiling for Probing the Modular Architecture of the Human Genome. , 2015, Cell systems.

[47]  F. Kondrashov,et al.  The evolution of gene duplications: classifying and distinguishing between models , 2010, Nature Reviews Genetics.

[48]  T. Nguyen,et al.  Ancestral Reconstruction , 2016, PLoS Comput. Biol..

[49]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[50]  Jijun Tang,et al.  Ancestral Genome Reconstruction on Whole Genome Level , 2017, Current genomics.

[51]  V. Mootha,et al.  Expansion of Biological Pathways Based on Evolutionary Inference , 2014, Cell.

[52]  Matthew W. Snyder,et al.  Genomic Medicine–Progress, Pitfalls, and Promise , 2019, Cell.

[53]  Sean R. Collins,et al.  Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling. , 2015, Cell reports.

[54]  M. Hiller,et al.  Genes lost during the transition from land to water in cetaceans highlight genomic changes associated with aquatic adaptations , 2019, Science Advances.

[55]  J. Poulain,et al.  Pezizomycetes genomes reveal the molecular basis of ectomycorrhizal truffle lifestyle , 2018, Nature Ecology & Evolution.

[56]  A. Godzik,et al.  Exploration of Uncharted Regions of the Protein Universe , 2009, PLoS biology.

[57]  F. Collins,et al.  Shattuck lecture--medical and societal consequences of the Human Genome Project. , 1999, The New England journal of medicine.

[58]  M. Hiller,et al.  REforge Associates Transcription Factor Binding Site Divergence in Regulatory Elements with Phenotypic Differences between Species , 2018, Molecular biology and evolution.

[59]  M. Swartz A decade later. , 2011, Journal of pediatric health care : official publication of National Association of Pediatric Nurse Associates & Practitioners.

[60]  G. Bejerano,et al.  A "forward genomics" approach links genotype to phenotype using independent phenotypic losses among related species. , 2012, Cell reports.

[61]  Huanming Yang,et al.  Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis , 2018, Science.

[62]  L. Nakhleh,et al.  Computational approaches to species phylogeny inference and gene tree reconciliation. , 2013, Trends in ecology & evolution.

[63]  S. O’Brien,et al.  The Genome 10K Project: a way forward. , 2015, Annual review of animal biosciences.

[64]  Jijun Tang Ancestral Genome Reconstruction , 2019, Bioinformatics and Phylogenetics.

[65]  Gaston H. Gonnet,et al.  The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces , 2017, Nucleic Acids Res..

[66]  Susan J. Brown,et al.  Creating a buzz about insect genomes. , 2011, Science.

[67]  J. Stajich Fungal Genomes and Insights into the Evolution of the Kingdom , 2017, Microbiology spectrum.

[68]  Matthew W. Hahn,et al.  Gene-tree reconciliation with MUL-trees to resolve polyploidy events , 2016, bioRxiv.

[69]  N. Paneth,et al.  Promises, promises, and precision medicine. , 2019, The Journal of clinical investigation.

[70]  Nadia El-Mabrouk,et al.  Efficient Gene Tree Correction Guided by Genome Evolution , 2016, PloS one.

[71]  Y. Bossé,et al.  Benefits and limitations of genome-wide association studies , 2019, Nature Reviews Genetics.

[72]  D. Hibbett,et al.  Comparative Genomics of Early-Diverging Mushroom-Forming Fungi Provides Insights into the Origins of Lignocellulose Decay Capabilities. , 2016, Molecular biology and evolution.

[73]  Andrew Meade,et al.  Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes , 2007, Bioinform..

[74]  Xavier Didelot,et al.  A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination , 2017, bioRxiv.

[75]  Thijs J. G. Ettema,et al.  Asgard archaea are the closest prokaryotic relatives of eukaryotes , 2018, PLoS genetics.

[76]  J. Bolker,et al.  Model organisms: There's more to life than rats and flies , 2012, Nature.

[77]  D. Hibbett,et al.  Genetic Bases of Fungal White Rot Wood Decay Predicted by Phylogenomic Analysis of Correlated Gene-Phenotype Evolution , 2017, Molecular biology and evolution.

[78]  A. Rokas,et al.  Functional and evolutionary characterization of a secondary metabolite gene cluster in budding yeasts , 2018, Proceedings of the National Academy of Sciences.

[79]  M. Rausher,et al.  Two genetic changes in cis-regulatory elements caused evolution of petal spot position in Clarkia , 2018, Nature Plants.

[80]  Amborella Genome The Amborella Genome and the Evolution of Flowering Plants , 2013, Science.

[81]  J. Archibald,et al.  More protist genomes needed , 2017, Nature Ecology &Evolution.

[82]  T. Gabaldón Large-scale assignment of orthology: back to phylogenetics? , 2008, Genome Biology.

[83]  Riccardo Percudani,et al.  Completing the uric acid degradation pathway through phylogenetic comparison of whole genomes , 2006, Nature chemical biology.

[84]  A. Twyford The road to 10,000 plant genomes , 2018, Nature Plants.

[85]  Inna Dubchak,et al.  MycoCosm portal: gearing up for 1000 fungal genomes , 2013, Nucleic Acids Res..

[86]  Fredrik Ronquist,et al.  Bayesian Inference of Character Evolution , 2022 .

[87]  T. Stadler,et al.  Estimating shifts in diversification rates based on higher-level phylogenies , 2016, Biology Letters.

[88]  Shinichi Nakagawa,et al.  Phylogenetic comparative methods , 2017, Current Biology.

[89]  C. Cañestro,et al.  Evolution by gene loss , 2016, Nature Reviews Genetics.

[90]  J. Gordon The Human Genome Project promises insights into aging. , 1989, Geriatrics.

[91]  D. Hibbett,et al.  Latent homology and convergent regulatory evolution underlies the repeated emergence of yeasts , 2014, Nature Communications.

[92]  R. Ricklefs,et al.  Estimating diversification rates from phylogenetic information. , 2007, Trends in ecology & evolution.

[93]  Sean Doyle,et al.  Genome expansion and lineage-specific genetic innovations in the forest pathogenic fungi Armillaria , 2017, Nature Ecology & Evolution.

[94]  B. Degnan,et al.  The origin of Metazoa: a unicellular perspective , 2017, Nature Reviews Genetics.

[95]  C. Dunn,et al.  Comparative genomics and the diversity of life , 2016 .

[96]  K. Lindblad-Toh,et al.  Comparative genomics as a tool to understand evolution and disease , 2013, Genome research.

[97]  Vincent Berry,et al.  Models, algorithms and programs for phylogeny reconciliation , 2011, Briefings Bioinform..

[98]  J. Inoue,et al.  ORTHOSCOPE: An Automatic Web Tool for Phylogenetically Inferring Bilaterian Orthogroups with User-Selected Taxa , 2018, Molecular biology and evolution.

[99]  B. Henrissat,et al.  Comparative genomics provides insights into the lifestyle and reveals functional heterogeneity of dark septate endophytic fungi , 2018, Scientific Reports.

[100]  Peter E Midford,et al.  Estimating a binary character's effect on speciation and extinction. , 2007, Systematic biology.

[101]  Joshua M. Stuart,et al.  Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. , 2009, The Journal of heredity.

[102]  D. Normile Plant scientists plan massive effort to sequence 10,000 genomes , 2017 .

[103]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[104]  D. Werck-Reichhart,et al.  Cytochromes P450: a success story , 2000, Genome Biology.

[105]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[106]  L. Revell Supplementary appendix to : Comparing the rates of speciation and extinction between phylogenetic trees , 2018 .

[107]  Jonathan P. Bollback,et al.  Stochastic mapping of morphological characters. , 2003, Systematic biology.

[108]  M. Chikina,et al.  Robust method for detecting convergent shifts in evolutionary rates. , 2019, Molecular biology and evolution.

[109]  Avi Pfeffer,et al.  Automatic genome-wide reconstruction of phylogenetic gene trees , 2007, ISMB/ECCB.

[110]  S. Kelly,et al.  OrthoFinder: phylogenetic orthology inference for comparative genomics , 2019, Genome Biology.

[111]  Graham J. Etherington,et al.  Adaptation and conservation insights from the koala genome , 2018, Nature Genetics.

[112]  Valentín Ruano-Rubio,et al.  Comparison of eukaryotic phylogenetic profiling approaches using species tree aware methods , 2009, BMC Bioinformatics.

[113]  Matthew W. Pennell,et al.  Rethinking phylogenetic comparative methods. , 2018, Systematic biology.

[114]  M. Pagel Inferring the historical patterns of biological evolution , 1999, Nature.

[115]  Eloi Araujo,et al.  Fast ancestral gene order reconstruction of genomes with unequal gene content , 2016, BMC Bioinformatics.

[116]  N. Barton,et al.  Thinking About the Evolution of Complex Traits in the Era of Genome-Wide Association Studies. , 2019, Annual review of genomics and human genetics.

[117]  A. Simpson,et al.  Hemimastigophora is a novel supra-kingdom-level lineage of eukaryotes , 2018, Nature.

[118]  Christophe Dessimoz,et al.  Inferring orthology and paralogy. , 2012, Methods in molecular biology.