The Need for a Human Pangenome Reference Sequence.

The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 22 is August 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

[1]  William T. Harvey,et al.  Haplotype-resolved diverse human genomes and integrated analysis of structural variation , 2021, Science.

[2]  R. Loos,et al.  15 years of genome-wide association studies and no signs of slowing down , 2020, Nature Communications.

[3]  Peter A. Audano,et al.  Pangenome-based genome inference , 2020, bioRxiv.

[4]  S. Manson,et al.  The "All of Us" Program and Indigenous Peoples. , 2020, The New England journal of medicine.

[5]  David I. K. Martin,et al.  Towards a reference genome that captures global genetic diversity , 2020, Nature Communications.

[6]  Anisah W. Ghoorah,et al.  High-depth African genomes inform human migration and health , 2020, Nature.

[7]  William T. Harvey,et al.  The structure, function and evolution of a complete human chromosome 8 , 2020, Nature.

[8]  K. Moodley,et al.  Allegations of misuse of African DNA in the UK: Will data protection legislation in South Africa be sufficient to prevent a recurrence? , 2020, Developing world bioethics.

[9]  R. Gibbs The Human Genome Project changed everything , 2020, Nature Reviews Genetics.

[10]  Haowen Zhang,et al.  Haplotype-resolved de novo assembly with phased assembly graphs , 2020, 2008.01237.

[11]  P. Pevzner,et al.  Automated assembly of centromeres from ultra-long error-prone reads , 2020, Nature Biotechnology.

[12]  J. Brody,et al.  Evaluation of a genetic risk score for severity of COVID-19 using human chromosomal-scale length variation , 2020, medRxiv.

[13]  S. Pääbo,et al.  The major genetic risk factor for severe COVID-19 is inherited from Neanderthals , 2020, bioRxiv.

[14]  Michael J. Purcaro,et al.  Expanded encyclopaedias of DNA elements in the human and mouse genomes , 2020, Nature.

[15]  Evan E. Eichler,et al.  Long-read human genome sequencing and its applications , 2020, Nature Reviews Genetics.

[16]  Eric S. Lander,et al.  Mapping and characterization of structural variation in 17,795 human genomes , 2020, Nature.

[17]  Jordan M. Eizenga,et al.  Pangenome Graphs. , 2020, Annual review of genomics and human genetics.

[18]  Sergey Koren,et al.  Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes , 2020, Nature Biotechnology.

[19]  Sergey Koren,et al.  Merqury: reference-free quality and phasing assessment for genome assemblies , 2020, bioRxiv.

[20]  Sergey Koren,et al.  HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads , 2020, bioRxiv.

[21]  Hyeshik Chang,et al.  The Architecture of SARS-CoV-2 Transcriptome , 2020, Cell.

[22]  Shawneequa L. Callier,et al.  Evaluating the promise of inclusion of African ancestry populations in genomics , 2020, npj Genomic Medicine.

[23]  Chaochun Wei,et al.  A powerful HUPAN on a pan-genome study: significance and perspectives , 2020, Cancer biology & medicine.

[24]  Rachel M. Sherman,et al.  Pan-genomics in the human genome era , 2020, Nature Reviews Genetics.

[25]  The Icgctcga Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes , 2020 .

[26]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[27]  Jun Yu,et al.  Analyses of non-coding somatic drivers in 2,658 cancer whole genomes , 2020, Nature.

[28]  Swapan Mallick,et al.  Insights into human genetic variation and population history from 929 diverse genomes , 2019, Science.

[29]  G. Bourque,et al.  Personalized and graph genomes reveal missing signal in epigenomic data , 2020, Genome Biology.

[30]  M. McCarthy,et al.  Homogeneity in the association of body mass index with type 2 diabetes across the UK Biobank: A Mendelian randomization study , 2019, PLoS medicine.

[31]  Hannes P. Eggertsson,et al.  GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs , 2019, Nature Communications.

[32]  Tobias Marschall,et al.  GraphAligner: rapid and versatile sequence-to-graph alignment , 2019, Genome Biology.

[33]  Richard Durbin,et al.  Removing reference bias in ancient DNA data analysis by mapping to a sequence variation graph , 2019, bioRxiv.

[34]  Sergey Koren,et al.  Telomere-to-telomere assembly of a complete human X chromosome , 2019, bioRxiv.

[35]  Steven L Salzberg,et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype , 2019, Nature Biotechnology.

[36]  Hongyu Zhao,et al.  HUPAN: a pan-genome analysis pipeline for human genomes , 2019, Genome Biology.

[37]  Sergey Koren,et al.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome , 2019, Nature Biotechnology.

[38]  Geoffrey S Ginsburg,et al.  What will it take to implement genomics in practice? Lessons from the IGNITE Network. , 2019, Personalized medicine.

[39]  Glenn Hickey,et al.  Genotyping structural variants in pangenome graphs using the vg toolkit , 2019, Genome Biology.

[40]  T. Key,et al.  Diet and colorectal cancer in UK Biobank: a prospective study , 2019, International journal of epidemiology.

[41]  Sergey Koren,et al.  HLA*LA—HLA typing from linearly projected graph alignments , 2019, Bioinform..

[42]  E. Kenny,et al.  Personalized Medicine and the Power of Electronic Health Records , 2019, Cell.

[43]  Jeffrey Braithwaite,et al.  Integrating Genomics into Healthcare: A Global Responsibility. , 2019, American journal of human genetics.

[44]  Evan E. Eichler,et al.  Characterizing the Major Structural Variant Alleles of the Human Genome , 2019, Cell.

[45]  Alexander Payne,et al.  BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files , 2018, Bioinform..

[46]  A. Hastie,et al.  Bionano Genome Mapping: High-Throughput, Ultra-Long Molecule Genome Analysis System for Precision Genome Assembly and Haploid-Resolved Structural Variation Discovery. , 2019, Advances in experimental medicine and biology.

[47]  T. Günther,et al.  The presence and impact of reference bias on population genomic studies of prehistoric human populations , 2018, bioRxiv.

[48]  Rachel M. Sherman,et al.  Assembly of a pan-genome from deep sequencing of 910 humans of African descent , 2018, Nature Genetics.

[49]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[50]  Lucila Ohno-Machado,et al.  Enhancing diversity to reduce health information disparities and build an evidence base for genomic medicine. , 2018, Personalized medicine.

[51]  N. Risch,et al.  The Clinical Sequencing Evidence-Generating Research Consortium: Integrating Genomic Sequencing in Diverse and Medically Underserved Populations. , 2018, American journal of human genetics.

[52]  William Jones,et al.  Variation graph toolkit improves read mapping by representing genetic variation in the reference , 2018, Nature Biotechnology.

[53]  Pim van der Harst,et al.  Associations of Combined Genetic and Lifestyle Risks With Incident Cardiovascular Disease and Diabetes in the UK Biobank Study , 2018, JAMA cardiology.

[54]  Fritz J Sedlazeck,et al.  Piercing the dark matter: bioinformatics of long-range sequencing and mapping , 2018, Nature Reviews Genetics.

[55]  David Haussler,et al.  Linear assembly of a human centromere on the Y chromosome , 2018, Nature Biotechnology.

[56]  Wan-Ping Lee,et al.  Fast and accurate genomic analyses using genome graphs , 2019, Nature Genetics.

[57]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[58]  The Computational Pan-Genomics Consortium,et al.  Computational pan-genomics: status, promises and challenges , 2018, Briefings Bioinform..

[59]  David Haussler,et al.  Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation , 2017, bioRxiv.

[60]  Michael J. T. Stubbington,et al.  The Human Cell Atlas: from vision to reality , 2017, Nature.

[61]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[62]  Lars Bolund,et al.  Sequencing and de novo assembly of 150 genomes from Denmark as a population reference , 2017, Nature.

[63]  Janina M. Jeff,et al.  Genetic identification of a common collagen disease in Puerto Ricans via identity-by-descent mapping in a health system , 2017, bioRxiv.

[64]  Joshua C. Denny,et al.  Challenges and strategies for implementing genomic services in diverse settings: experiences from the Implementing GeNomics In pracTicE (IGNITE) network , 2017, BMC Medical Genomics.

[65]  Jonas Korlach,et al.  Discovery and genotyping of structural variation from long-read haploid genome sequence data , 2017, Genome research.

[66]  Laura W. Harris,et al.  A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog , 2018, Genome Biology.

[67]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[68]  Jordan M. Eizenga,et al.  Genome graphs and the evolution of genome inference , 2017, bioRxiv.

[69]  Y. Kamatani,et al.  Overview of the BioBank Japan Project: Study design and profile , 2017, Journal of epidemiology.

[70]  Victor O. Leshyk,et al.  The 4D nucleome project , 2017, Nature.

[71]  Benedict Paten,et al.  Modelling haplotypes with respect to reference cohort variation graphs , 2017, bioRxiv.

[72]  Benedict Paten,et al.  The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows , 2017, F1000Research.

[73]  Chris Shaw,et al.  Detection of long repeat expansions from PCR-free whole-genome sequence data , 2016, bioRxiv.

[74]  Xin Li,et al.  The impact of structural variation on human gene expression , 2016, Nature Genetics.

[75]  Peter N. Robinson,et al.  Alternate-locus aware variant calling in whole genome sequencing , 2016, Genome Medicine.

[76]  Alexander Schönhuth,et al.  A high-quality human reference panel reveals the complexity and distribution of genomic structural variants , 2016, Nature communications.

[77]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[78]  Levi C. T. Pierce,et al.  Deep sequencing of 10,000 human genomes , 2016, Proceedings of the National Academy of Sciences.

[79]  Rachel G Liao,et al.  A federated ecosystem for sharing genomic, clinical data , 2016, Science.

[80]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[81]  L. Beskow Lessons from HeLa Cells: The Ethics and Policy of Biospecimens , 2016, Annual review of genomics and human genetics.

[82]  Brendan L. O’Connell,et al.  Chromosome-scale shotgun assembly using an in vitro method for long-range linkage , 2015, Genome research.

[83]  J. D. Watson,et al.  Human Genome Project: Twenty-five years of big biology , 2015, Nature.

[84]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[85]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[86]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[87]  David R. Riley,et al.  Ten years of pan-genome analyses. , 2015, Current opinion in microbiology.

[88]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[89]  Jakob Grove,et al.  Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios , 2015, Nature Communications.

[90]  P. Marshall,et al.  The translational potential of research on the ethical, legal, and social implications of genomics , 2014, Genetics in Medicine.

[91]  Howard Y. Chang,et al.  ATAC‐seq: A Method for Assaying Chromatin Accessibility Genome‐Wide , 2015, Current protocols in molecular biology.

[92]  Vitor R. C. Aguiar,et al.  Mapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data , 2014, G3: Genes, Genomes, Genetics.

[93]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[94]  M. Guyer,et al.  The Ethical, Legal, and Social Implications Program of the National Human Genome Research Institute: reflections on an ongoing experiment. , 2014, Annual review of genomics and human genetics.

[95]  Moritz Herrmann,et al.  Comparative analysis of metazoan chromatin organization , 2014, Nature.

[96]  Peter J. Bickel,et al.  Comparative Analysis of the Transcriptome across Distant Species , 2014, Nature.

[97]  William Stafford Noble,et al.  Comparative analysis of metazoan chromatin , 2014 .

[98]  C. Carlson,et al.  Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study , 2013, PLoS biology.

[99]  J. Dekker,et al.  Hi-C: a comprehensive technique to capture the conformation of genomes. , 2012, Methods.

[100]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[101]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[102]  B. Knoppers,et al.  Sampling populations of humans across the world: ELSI issues. , 2012, Annual review of genomics and human genetics.

[103]  H. Skirton,et al.  Direct to consumer genetic testing: a systematic review of position statements, policies and recommendations , 2012, Clinical genetics.

[104]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[105]  Clement Adebamowo,et al.  ELSI 2.0 for Genomics and Society , 2012, Science.

[106]  Judy Illes,et al.  Personal medicine—the new banking crisis , 2012, Nature Biotechnology.

[107]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[108]  S. Henikoff,et al.  Epigenome characterization at single base-pair resolution , 2011, Proceedings of the National Academy of Sciences.

[109]  C. Carlson,et al.  The Next PAGE in Understanding Complex Traits: Design for the Analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study , 2011, American journal of epidemiology.

[110]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[111]  C. Carlson,et al.  Genetic Determinants of Lipid Traits in Diverse Populations from the Population Architecture using Genomics and Epidemiology (PAGE) Study , 2011, PLoS genetics.

[112]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[113]  Robert L. Grossman,et al.  A cis-regulatory map of the Drosophila genome , 2011, Nature.

[114]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[115]  After Havasupai litigation, Native Americans wary of genetic research , 2010, American journal of medical genetics. Part A.

[116]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[117]  J. Couzin-Frankel Ethics. DNA returned to tribe, raising questions about consent. , 2010, Science.

[118]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[119]  Lars Bolund,et al.  Building the sequence map of the human pan-genome , 2010, Nature Biotechnology.

[120]  Joshua M. Stuart,et al.  Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. , 2009, The Journal of heredity.

[121]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[122]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[123]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[124]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[125]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[126]  H. Greely The uneasy ethical and legal underpinnings of large-scale genomic biobanks. , 2007, Annual review of genomics and human genetics.

[127]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[128]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[129]  L. Cavalli-Sforza The Human Genome Diversity Project: past, present and future , 2005, Nature Reviews Genetics.

[130]  A. Petersen Securing our genetic health: engendering trust in UK Biobank. , 2005, Sociology of health & illness.

[131]  J. E. Reardon Race to the Finish: Identity and Governance in an Age of Genomics , 2004 .

[132]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[133]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[134]  J. Kaye,et al.  Governing UK Biobank: the importance of ensuring public trust. , 2004, Trends in biotechnology.

[135]  Evan E. Eichler,et al.  An assessment of the sequence gaps: Unfinished business in a finished human genome , 2004, Nature Reviews Genetics.

[136]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[137]  F. Collins,et al.  The Human Genome Project: Lessons from Large-Scale Biology , 2003, Science.

[138]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[139]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[140]  R. Williamson,et al.  Indigenous peoples and the morality of the Human Genome Diversity Project. , 1999, Journal of medical ethics.

[141]  Miyachi,et al.  Lupoid sycosis successfully treated with minocycline , 1998, The British journal of dermatology.

[142]  D. Galas,et al.  A new five-year plan for the U.S. Human Genome Project. , 1993, Science.

[143]  R. Williams,et al.  Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. , 1988, American journal of human genetics.

[144]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[145]  M. B. Moore,et al.  THE TUSKEGEE STUDY OF UNTREATED SYPHILIS; THE 30TH YEAR OF OBSERVATION. , 1964, Archives of internal medicine.