Computational pan-genomics: status, promises and challenges

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

Ying Zhang | Eleazar Eskin | Knut Reinert | Francesca Chiaromonte | Paul Medvedev | Benedict Paten | Veli Mäkinen | Victor Guryev | Tobias Marschall | Nadia Pisanti | Rayan Chikhi | Eric-Wubbo Lameijer | Ole Schulz-Trieglaff | Pierre Peterlongo | Carl Shneider | Manja Marz | Fabio Vandin | Can Alkan | Thomas Abeel | Kai Ye | Pieter B. T. Neerincx | Adam M. Novak | Erik Garrison | Louis Dijkstra | Alexander Schönhuth | Bas E Dutilh | Mohammed El-Kebir | Daniel Valenzuela | Gunnar W Klau | Valentina Boeva | Paul Kersey | Corinna Ernst | Matthias Schlesner | Eric Rivals | Siavash Sheikhizadeh | Sandra Smit | Cornelia M Van Duijn | Jasmijn A Baaijens | Sven Rahmann | Jiayin Wang | Benjamin Langmead | Ali Ghaffaari | Adam M Novak | David Porubsky | Robin Cijvat | Erwin Datema | Marcel Martin | Pieter Neerincx | Klaasjan Ouwens | Ben Raphael | Jeroen de Ridder | Lodewyk Wessels | Jan O Korbel | Tobias Manja Thomas Louis Bas E Ali Paul Wigard P Veli Ad Marschall Marz Abeel Dijkstra Dutilh Ghaff | Wigard P Kloosterman | Paul I W De Bakker | Raoul J P Bonnal | Francesca D Ciccarelli | Evan E Eichler | John C Mu | Dick de Ridder | Ashley D Sanders | E. Eichler | K. Reinert | C. Alkan | B. Langmead | E. Eskin | J. Korbel | P. D. de Bakker | K. Ye | B. Paten | P. Kersey | Fabio Vandin | W. Kloosterman | M. Schlesner | Erik K. Garrison | T. Marschall | C. V. van Duijn | F. Chiaromonte | L. Wessels | R. Chikhi | V. Mäkinen | Eric Rivals | V. Guryev | A. Sanders | D. Porubsky | J. Mu | Eric-Wubbo Lameijer | S. Smit | F. Ciccarelli | P. Peterlongo | M. Marz | T. Abeel | V. Boeva | Marcel Martin | D. de Ridder | E. Datema | N. Pisanti | B. Dutilh | S. Rahmann | A. Schönhuth | Jiayin Wang | G. Klau | Ole Schulz-Trieglaff | M. El-Kebir | R. Bonnal | The Icgctcga Pan-Cancer Analysis of Whole Genomes Consortium | J. Baaijens | J. de Ridder | C. Ernst | Ying Zhang | L. Dijkstra | K. Ouwens | Robin Cijvat | C. Shneider | Ali Ghaffaari | Daniel Valenzuela | P. Medvedev | B. Raphael | Siavash Sheikhizadeh

[1]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[2]  Gil McVean,et al.  Improved genome inference in the MHC using a population reference graph , 2014, Nature Genetics.

[3]  Jens Roat Kultima,et al.  An integrated catalog of reference genes in the human gut microbiome , 2014, Nature Biotechnology.

[4]  Michael C. Schatz,et al.  SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips , 2014, Bioinform..

[5]  Dmitry Pushkarev,et al.  Whole-genome haplotyping using long reads and statistical methods , 2014, Nature Biotechnology.

[6]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[7]  Andrew C. Adey,et al.  Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions , 2013, Nature Biotechnology.

[8]  Bas E. Dutilh,et al.  Beyond research: a primer for considerations on using viral metagenomics in the field and clinic , 2015, Front. Microbiol..

[9]  Timothy D Read,et al.  Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology , 2014, Genome Medicine.

[10]  N. McGranahan,et al.  Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. , 2015, Cancer cell.

[11]  K. Polyak,et al.  Intra-tumour heterogeneity: a looking glass for cancer? , 2012, Nature Reviews Cancer.

[12]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[13]  Kendra N. Pesko,et al.  Complete viral RNA genome sequencing of ultra-low copy samples by sequence-independent amplification , 2012, Nucleic acids research.

[14]  Jens Stoye,et al.  Bloom Filter Trie - A Data Structure for Pan-Genome Storage , 2015, WABI.

[15]  A. Magi,et al.  Detection of Genomic Structural Variants from Next-Generation Sequencing Data , 2015, Front. Bioeng. Biotechnol..

[16]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[17]  Alexander Schönhuth,et al.  Discovering motifs that induce sequencing errors , 2013, BMC Bioinformatics.

[18]  K. Metzner,et al.  Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data , 2012, Front. Microbio..

[19]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[20]  Joachim Denzler,et al.  Explorative Analysis of Heterogeneous, Unstructured, and Uncertain Data - A Computer Science Perspective on Biodiversity Research , 2014, DATA.

[21]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[22]  N. Loman,et al.  A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. , 2013, JAMA.

[23]  Robert A Edwards,et al.  Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions , 2014, BMC Genomics.

[24]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[25]  Jonathan Crabtree,et al.  Genomic Epidemiology of the Haitian Cholera Outbreak: a Single Introduction Followed by Rapid, Extensive, and Continued Spread Characterized the Onset of the Epidemic , 2014, mBio.

[26]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[27]  Szymon Grabowski,et al.  Indexes of Large Genome Collections on a PC , 2014, PloS one.

[28]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[29]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[30]  Enno Ohlebusch,et al.  Efficient Construction of a Compressed de Bruijn Graph for Pan-Genome Analysis , 2015, CPM.

[31]  T. Williams,et al.  An archaeal origin of eukaryotes supports only two primary domains of life , 2013, Nature.

[32]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[33]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[34]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[35]  Richard J. Hall,et al.  MinION nanopore sequencing of an influenza genome , 2015, Front. Microbiol..

[36]  Leo van Iersel,et al.  WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads , 2015, J. Comput. Biol..

[37]  Jun Wang,et al.  Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. , 2014, The Plant journal : for cell and molecular biology.

[38]  Gos Micklem,et al.  Expression of multiple horizontally acquired genes is a hallmark of both vertebrate and invertebrate genomes , 2015, Genome Biology.

[39]  Andrew Menzies,et al.  Analysis of the Genetic Phylogeny of Multifocal Prostate Cancer Identifies Multiple Independent Clonal Expansions in Neoplastic and Morphologically Normal Prostate Tissue , 2015, Nature Genetics.

[40]  R. Gibbs,et al.  Comparative primate genomics: emerging patterns of genome content and dynamics , 2014, Nature Reviews Genetics.

[41]  David Heckerman,et al.  A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control , 2013, eLife.

[42]  Volodymyr Kuleshov,et al.  Probabilistic single-individual haplotyping , 2014, Bioinform..

[43]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[44]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[45]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[46]  N. Carter,et al.  Estimation of rearrangement phylogeny for cancer genomes. , 2012, Genome research.

[47]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[48]  C. Nusbaum,et al.  Comprehensive variation discovery in single human genomes , 2014, Nature Genetics.

[49]  Sven Rahmann,et al.  PanCake: A Data Structure for Pangenomes , 2013, GCB.

[50]  Jan O. Korbel,et al.  Data analysis: Create a cloud commons , 2015, Nature.

[51]  Gustavo Glusman,et al.  Whole-genome haplotyping approaches and genomic medicine , 2014, Genome Medicine.

[52]  N. Warthmann,et al.  Simultaneous alignment of short reads against multiple genomes , 2009, Genome Biology.

[53]  Victor V. Solovyev,et al.  The Ctenophore Genome and the Evolutionary Origins of Neural Systems , 2014, Nature.

[54]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[55]  Michele Morgante,et al.  Transposable elements and the plant pan-genomes. , 2007, Current opinion in plant biology.

[56]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[57]  Sorin Istrail,et al.  Haplotype assembly in polyploid genomes and identical by descent shared tracts , 2013, Bioinform..

[58]  Hilary S. Leeds,et al.  Data use under the NIH GWAS Data Sharing Policy and future directions , 2014, Nature Genetics.

[59]  Julian Parkhill,et al.  A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic , 2013, Genome research.

[60]  Robert P. Davey,et al.  Population genomics of domestic and wild yeasts , 2008, Nature.

[61]  Bas E. Dutilh,et al.  Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly , 2009, Bioinform..

[62]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[63]  Knut Reinert,et al.  Journaled string tree - a scalable data structure for analyzing thousands of similar genomes on your laptop , 2014, Bioinform..

[64]  F. Rohwer,et al.  Metagenomics and future perspectives in virus discovery , 2012, Current Opinion in Virology.

[65]  P. Ashton,et al.  MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island , 2014, Nature Biotechnology.

[66]  C. Greenman Estimation of Rearrangement Phylogeny in Cancer , 2012 .

[67]  E. Domingo,et al.  Quasispecies Theory in Virology , 2002, Journal of Virology.

[68]  Matthew W. Snyder,et al.  Haplotype-resolved genome sequencing: experimental methods and applications , 2015, Nature Reviews Genetics.

[69]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[70]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[71]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[72]  Knut Reinert,et al.  Genome alignment with graph data structures: a comparison , 2014, BMC Bioinformatics.

[73]  Pieter B. T. Neerincx,et al.  Supplementary Information Whole-genome sequence variation , population structure and demographic history of the Dutch population , 2022 .

[74]  Dan M. Bolser,et al.  Ensembl Genomes 2016: more genomes, more complexity , 2015, Nucleic Acids Res..

[75]  Dick de Ridder,et al.  PanTools: representation, storage and exploration of pan-genomic data , 2016, Bioinform..

[76]  Kevin Y. Yip,et al.  Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays , 2015, Genetics.

[77]  David Haussler,et al.  Cactus: Algorithms for genome multiple sequence alignment. , 2011, Genome research.

[78]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[79]  Sebastian Deorowicz,et al.  KMC 2: Fast and resource-frugal k-mer counting , 2014, Bioinform..

[80]  Nikolay Vyahhi,et al.  Sibelia: A Scalable and Comprehensive Synteny Block Generation Tool for Closely Related Microbial Genomes , 2013, WABI.

[81]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[82]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[83]  Yongjun Zhao,et al.  DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution , 2012, Nature Methods.

[84]  Veli Mäkinen,et al.  Indexing Graphs for Path Queries with Applications in Genome Research , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[85]  Lisa Zeigler Allen,et al.  Single Virus Genomics: A New Tool for Virus Discovery , 2011, PloS one.

[86]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[87]  David Haussler,et al.  Comparative assembly hubs: Web-accessible browsers for comparative genomics , 2013, Bioinform..

[88]  Aaron R. Quinlan,et al.  Efficient genotype compression and analysis of large genetic variation datasets , 2015, Nature Methods.

[89]  Haixu Tang,et al.  De novo repeat classification and fragment assembly , 2004, RECOMB.

[90]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[91]  Bonnie Berger,et al.  HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data , 2014, PLoS Comput. Biol..

[92]  Dominique Lavenier,et al.  DSK: k-mer counting with very low memory usage , 2013, Bioinform..

[93]  Rob J. L. Willems,et al.  Dissemination of Cephalosporin Resistance Genes between Escherichia coli Strains from Farm Animals and Humans by Specific Plasmid Lineages , 2014, PLoS genetics.

[94]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[95]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[96]  R. Mott,et al.  The 1001 Genomes Project for Arabidopsis thaliana , 2009, Genome Biology.

[97]  David Penny,et al.  The Origin of Land Plants: A Phylogenomic Perspective , 2015, Evolutionary bioinformatics online.

[98]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[99]  Haixu Tang,et al.  Splicing graphs and EST assembly problem , 2002, ISMB.

[100]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[101]  David Haussler,et al.  Building a Pan-Genome Reference for a Population , 2015, J. Comput. Biol..

[102]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[103]  Annelot M. Dekker,et al.  Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis , 2017 .

[104]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[105]  Roberto Grossi,et al.  Mobilomics in Saccharomyces cerevisiae strains , 2013, BMC Bioinformatics.

[106]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial Samples , 2008, PloS one.

[107]  David Heckerman,et al.  Correlates of Protective Cellular Immunity Revealed by Analysis of Population-Level Immune Escape Pathways in HIV-1 , 2012, Journal of Virology.

[108]  Stefan Engelen,et al.  Genome assembly using Nanopore-guided long and error-free DNA reads , 2015, BMC Genomics.

[109]  F Sigaux [Cancer genome or the development of molecular portraits of tumors]. , 2000, Bulletin de l'Academie nationale de medecine.

[110]  David Haussler,et al.  Cactus Graphs for Genome Comparisons , 2010, RECOMB.

[111]  Kay Nieselt,et al.  Pan-Tetris: an interactive visualisation for Pan-genomes , 2015, BMC Bioinformatics.

[112]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[113]  Simon J. Puglisi,et al.  Searching and Indexing Genomic Databases via Kernelization , 2014, bioRxiv.

[114]  Kay Nieselt,et al.  GenomeRing: alignment visualization based on SuperGenome coordinates , 2012, Bioinform..

[115]  Enno Ohlebusch,et al.  Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform , 2016, Bioinform..

[116]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[117]  C. Dekker,et al.  DNA sequencing with nanopores , 2012, Nature Biotechnology.

[118]  R. Edwards,et al.  Viral metagenomics , 2005, Nature Reviews Microbiology.

[119]  Bas E. Dutilh,et al.  Assessment of phylogenomic and orthology approaches for phylogenetic inference , 2007, Bioinform..

[120]  A. Fujiyama,et al.  A map of rice genome variation reveals the origin of cultivated rice , 2012, Nature.

[121]  Pui-Yan Kwok,et al.  Rapid Genome Mapping in Nanochannel Arrays for Highly Complete and Accurate De Novo Sequence Assembly of the Complex Aegilops tauschii Genome , 2013, PloS one.

[122]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[123]  Faraz Hach,et al.  Robustness of Massively Parallel Sequencing Platforms , 2015, PloS one.

[124]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[125]  Richard Durbin,et al.  Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT) , 2014, Bioinform..

[126]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[127]  Veli Mäkinen,et al.  Indexing Finite Language Representation of Population Genotypes , 2010, WABI.

[128]  Jun Yu,et al.  A Brief Review of Software Tools for Pangenomics , 2015, Genom. Proteom. Bioinform..

[129]  Pierre Peterlongo,et al.  Read Mapping on de Bruijn graph , 2015, ArXiv.

[130]  Luigi Cattivelli,et al.  Emerging Knowledge from Genome Sequencing of Crop Species , 2012, Molecular Biotechnology.

[131]  P. Bork,et al.  Patterns and ecological drivers of ocean viral communities , 2015, Science.

[132]  Paola Bonizzoni,et al.  HapCol: accurate and memory-efficient haplotype assembly from long reads , 2016, Bioinform..

[133]  J. Chen,et al.  Genome-wide genetic changes during modern breeding of maize , 2012, Nature Genetics.

[134]  Pierre Peterlongo,et al.  Read mapping on de Bruijn graphs , 2015, BMC Bioinformatics.

[135]  David Haussler,et al.  Building a Pangenome Reference for a Population , 2014, RECOMB.

[136]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[137]  M. Daugherty,et al.  Rules of engagement: molecular insights from host-virus arms races. , 2012, Annual review of genetics.

[138]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[139]  D. Huson,et al.  A Survey of Combinatorial Methods for Phylogenetic Networks , 2010, Genome biology and evolution.

[140]  R. Edwards,et al.  Explaining microbial phenotypes on a genomic scale: GWAS for microbes , 2013, Briefings in functional genomics.

[141]  Cédric Notredame,et al.  Recent Evolutions of Multiple Sequence Alignment Algorithms , 2007, PLoS Comput. Biol..

[142]  V. Bansal,et al.  The importance of phase information for human genomics , 2011, Nature Reviews Genetics.

[143]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[144]  Alexander Schönhuth,et al.  Viral Quasispecies Assembly via Maximal Clique Enumeration , 2014, PLoS Comput. Biol..

[145]  David R. Riley,et al.  Ten years of pan-genome analyses. , 2015, Current opinion in microbiology.

[146]  Jun Li,et al.  Whole-genome sequencing reveals untapped genetic potential in Africa’s indigenous cereal crop sorghum , 2013, Nature Communications.

[147]  Gonzalo Navarro,et al.  Storage and Retrieval of Individual Genomes , 2009, RECOMB.

[148]  Gonzalo Navarro,et al.  Storage and Retrieval of Highly Repetitive Sequence Collections , 2010, J. Comput. Biol..

[149]  Lin Huang,et al.  Short read alignment with populations of genomes , 2013, Bioinform..

[150]  Dominique Lavenier,et al.  GATB: Genome Assembly & Analysis Tool Box , 2014, Bioinform..

[151]  E. Birney,et al.  Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. , 2008, Genome research.

[152]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[153]  Paul Medvedev,et al.  On the representation of de Bruijn graphs , 2014, RECOMB.

[154]  David C. Schwartz,et al.  High-resolution human genome structure by single-molecule analysis , 2010, Proceedings of the National Academy of Sciences.

[155]  S. Tringe,et al.  Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.

[156]  I. Dubchak,et al.  Visualizing genomes: techniques and challenges , 2010, Nature Methods.

[157]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[158]  B. Snel,et al.  Genome trees and the nature of genome evolution. , 2005, Annual review of microbiology.

[159]  Páll Melsted,et al.  Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.

[160]  S. Lewis,et al.  Quest for Orthologs Entails Quest for Tree of Life: In Search of the Gene Stream , 2015, Genome biology and evolution.

[161]  Jeffrey T Leek,et al.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.

[162]  Maurits J. J. Dijkstra,et al.  Multiple Sequence Alignment. , 2017, Methods in molecular biology.