Haplotype sharing provides insights into fine-scale population history and disease in Finland

Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assemble a comprehensive view of recent population history (≤100 generations), the timespan during which most rare disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to geographically and linguistically adjacent countries with different population histories, including 16,060 Swedes, Estonians, Russians, and Hungarians. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from over 25,000 individuals, we find that while haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland share several-fold more of their genome in identity-by-descent (IBD) segments than individuals from southwest regions containing the major cities of Helsinki and Turku. We estimate recent effective population size changes over time across regions of Finland and find significant differences between the Early and Late Settlement Regions as expected; however, our results indicate more continuous gene flow than previously indicated as Finns migrated towards the northernmost Lapland region. Lastly, we show that haplotype sharing is locally enriched among pairs of individuals sharing rare alleles by an order of magnitude, especially among pairs sharing rare disease causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.

[1]  Mattias Jakobsson,et al.  Population genomics of Mesolithic Scandinavia: Investigating early postglacial migration routes and high-latitude adaptation , 2018, PLoS biology.

[2]  Matti Pirinen,et al.  Fine-Scale Genetic Structure in Finland , 2017, G3: Genes, Genomes, Genetics.

[3]  Ross M. Fraser,et al.  Narrow-sense heritability estimation of complex traits using identity-by-descent information , 2018, Heredity.

[4]  David Reich,et al.  The promise of disease gene discovery in South Asia , 2017, Nature Genetics.

[5]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[6]  Janina M. Jeff,et al.  Genetic identification of a common collagen disease in Puerto Ricans via identity-by-descent mapping in a health system , 2017, bioRxiv.

[7]  S. Ennis,et al.  Genomic insights into the population structure and history of the Irish Travellers , 2017, Scientific Reports.

[8]  Ross E. Curtis,et al.  Clustering of 770,000 genomes reveals post-colonial population structure of North America , 2017, Nature Communications.

[9]  D. I. Boomsma,et al.  MixFit: Methodology for Computing Ancestry-Related Genetic Scores at the Individual Level and Its Application to the Estonian and Finnish Population Studies , 2017, PloS one.

[10]  Zachary A. Szpiech,et al.  A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome , 2016, Nature Communications.

[11]  Shane A. McCarthy,et al.  Reference-based phasing using the Haplotype Reference Consortium panel , 2016, Nature Genetics.

[12]  M. Feldman,et al.  The Italian genome reflects the history of Europe and the Mediterranean basin , 2015, European Journal of Human Genetics.

[13]  Maria Cerezo,et al.  Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences , 2016, Nature Genetics.

[14]  L. Groop,et al.  Excess maternal transmission of variants in the THADA gene to offspring with type 2 diabetes , 2016, Diabetologia.

[15]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[16]  Po-Ru Loh,et al.  Fast and accurate long-range phasing in a UK Biobank cohort , 2015, Nature Genetics.

[17]  Matthew Stephens,et al.  Visualizing spatial population structure with estimated effective migration surfaces , 2014, Nature Genetics.

[18]  Beryl B. Cummings,et al.  A protein-truncating R179X variant in RNF186 confers protection against ulcerative colitis , 2015, Nature Communications.

[19]  Carl D. Langefeld,et al.  Genomic Insights into the Ancestry and Demographic History of South America , 2015, PLoS genetics.

[20]  Scott M. Williams,et al.  The Great Migration and African-American Genomic Diversity , 2015, bioRxiv.

[21]  Brian L Browning,et al.  Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent. , 2015, American journal of human genetics.

[22]  A. Konradi,et al.  Seventy years after the siege of Leningrad: does early life famine still affect cardiovascular risk and aging? , 2015, Journal of hypertension.

[23]  S. Carmi,et al.  Genotyping of geographically diverse Druze trios reveals substructure and a recent bottleneck , 2014, European Journal of Human Genetics.

[24]  Cristina E. Valdiosera,et al.  The ancestry and affiliations of Kennewick Man , 2015, Nature.

[25]  Laura J. Scott,et al.  Directional dominance on stature and cognition in diverse human populations , 2015, Nature.

[26]  Katja Borodulin,et al.  Forty-year trends in cardiovascular risk factors in Finland. , 2015, European journal of public health.

[27]  Sara M. Willems,et al.  The impact of low-frequency and rare variants on lipid levels , 2015, Nature Genetics.

[28]  Laurent Excoffier,et al.  Distance from sub-Saharan Africa predicts mutational load in diverse human genomes , 2015, Proceedings of the National Academy of Sciences.

[29]  Qian S. Zhang,et al.  Genome-wide haplotypic testing in a Finnish cohort identifies a novel association with low-density lipoprotein cholesterol , 2014, European Journal of Human Genetics.

[30]  M. Pirinen,et al.  The fine-scale genetic structure of the British population , 2015, Nature.

[31]  V. Salomaa,et al.  Prevalence and clinical correlates of familial hypercholesterolemia founder mutations in the general population. , 2015, Atherosclerosis.

[32]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[33]  Gil McVean,et al.  Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants , 2014, Nature Communications.

[34]  Andres Metspalu,et al.  Distribution and Medical Impact of Loss-of-Function Variants in the Finnish Founder Population , 2014, PLoS genetics.

[35]  Pieter B. T. Neerincx,et al.  Supplementary Information Whole-genome sequence variation , population structure and demographic history of the Dutch population , 2022 .

[36]  H. Ostrer,et al.  Genome-wide mapping of IBD segments in an Ashkenazi PD cohort identifies associated haplotypes. , 2014, Human molecular genetics.

[37]  D. Altshuler,et al.  Simulation of Finnish population history, guided by empirical genetic data, to assess power of rare-variant tests in Finland. , 2014, American journal of human genetics.

[38]  Gad Abraham,et al.  Fast Principal Component Analysis of Large-Scale Genome-Wide Data , 2014, bioRxiv.

[39]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[40]  Claudia Moreau,et al.  Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population , 2013, European Journal of Human Genetics.

[41]  K. Lohmueller The Impact of Population Demography and Selection on the Genetic Architecture of Complex Traits , 2013, PLoS genetics.

[42]  Matthew D. Rasmussen,et al.  Genome-Wide Inference of Ancestral Recombination Graphs , 2013, PLoS genetics.

[43]  Ariella L. Gladstein,et al.  No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews , 2013, Human biology.

[44]  Brian L Browning,et al.  Detecting identity by descent and estimating genotype error rates in sequence data. , 2013, American journal of human genetics.

[45]  M. Jarvelin,et al.  Deletion of TOP3β, a component of FMRP-containing mRNPs, contributes to neurodevelopmental disorders , 2013, Nature Neuroscience.

[46]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[47]  Stephan J Sanders,et al.  Intellectual disability is associated with increased runs of homozygosity in simplex autism. , 2013, American journal of human genetics.

[48]  Zachary A. Szpiech,et al.  Long runs of homozygosity are enriched for deleterious variation. , 2013, American journal of human genetics.

[49]  Christopher R. Gignoux,et al.  Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data , 2013, PLoS genetics.

[50]  Jake K. Byrnes,et al.  Reconstructing the Population Genetic History of the Caribbean , 2013, PLoS genetics.

[51]  V. Sousa,et al.  Understanding the origin of species with genome-scale data: modelling gene flow , 2013, Nature Reviews Genetics.

[52]  Christopher R. Gignoux,et al.  Gene flow from North Africa contributes to differential human genetic diversity in southern Europe , 2013, Proceedings of the National Academy of Sciences.

[53]  N. Patterson,et al.  Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits , 2013, PLoS genetics.

[54]  Adam Kiezun,et al.  Deleterious Alleles in the Human Genome Are on Average Younger Than Neutral Alleles of the Same Frequency , 2013, PLoS genetics.

[55]  Chris Haley,et al.  Inference of identity by descent in population isolates and optimal sequencing studies , 2013, European Journal of Human Genetics.

[56]  B. Berger,et al.  Reconstructing Roma History from Genome-Wide Data , 2012, PloS one.

[57]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[58]  B. Browning,et al.  Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort , 2012, Human Genetics.

[59]  Peter L. Ralph,et al.  The Geography of Recent Genetic Ancestry across Europe , 2012, PLoS biology.

[60]  Brian L. Browning,et al.  Erratum to: Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort , 2013, Human Genetics.

[61]  I. Pe’er,et al.  Length distributions of identity by descent reveal fine-scale demographic history. , 2012, American journal of human genetics.

[62]  Elina Salmela,et al.  Genetic structure in Finland and Sweden : aspects of population history and gene mapping , 2012 .

[63]  David Comas,et al.  North African Jewish and non-Jewish populations form distinctive, orthogonal clusters , 2012, Proceedings of the National Academy of Sciences.

[64]  Sharon R. Browning,et al.  Detecting Rare Variant Associations by Identity-by-Descent Mapping in Case-Control Studies , 2012, Genetics.

[65]  D. Falush,et al.  Inference of Population Structure using Dense Haplotype Data , 2012, PLoS genetics.

[66]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[67]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[68]  D. MacArthur,et al.  Loss-of-function variants in the genomes of healthy humans. , 2010, Human molecular genetics.

[69]  Itsik Pe'er,et al.  Abraham's children in the genome era: major Jewish diaspora populations comprise distinct genetic clusters with shared Middle Eastern Ancestry. , 2010, American journal of human genetics.

[70]  M. Tallavaara,et al.  Prehistoric population history in eastern Fennoscandia , 2010 .

[71]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[72]  R. Mägi,et al.  Genetic Structure of Europeans: A View from the North–East , 2009, PloS one.

[73]  Pekka Ellonen,et al.  Genetic markers and population history: Finland revisited , 2009, European Journal of Human Genetics.

[74]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[75]  Andrew Collins,et al.  The genome-wide patterns of variation expose significant substructure in a founder population. , 2008, American journal of human genetics.

[76]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[77]  Stefan Schreiber,et al.  Genome-Wide Analysis of Single Nucleotide Polymorphisms Uncovers Population Structure in Northern Europe , 2008, PloS one.

[78]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[79]  Itsik Pe'er,et al.  Evaluating potential for whole-genome studies in Kosrae, an isolated population in Micronesia , 2006, Nature Genetics.

[80]  Sohini Ramachandran,et al.  Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[81]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[82]  Kenneth Lange,et al.  Use of population isolates for mapping complex traits , 2000, Nature Reviews Genetics.

[83]  H. Vézina,et al.  New estimates of intergenerational time intervals for the calculation of age and origins of mutations. , 2000, American journal of human genetics.

[84]  T Varilo,et al.  Molecular genetics of the Finnish disease heritage. , 1999, Human molecular genetics.

[85]  L. Peltonen,et al.  Dual origins of Finns revealed by Y chromosome haplotype variation. , 1998, American journal of human genetics.

[86]  S. Pääbo,et al.  Paternal and maternal DNA lineages reveal a bottleneck in the founding of the Finnish population. , 1996, Proceedings of the National Academy of Sciences of the United States of America.