Haplotype sharing provides insights into fine-scale population history and disease in Finland

Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assemble a comprehensive view of recent population history (≤100 generations), the timespan during which most rare disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to geographically and linguistically adjacent countries with different population histories, including 16,060 Swedes, Estonians, Russians, and Hungarians. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from over 25,000 individuals, we find that while haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland share several-fold more of their genome in identity-by-descent (IBD) segments than individuals from southwest regions containing the major cities of Helsinki and Turku. We estimate recent effective population size changes over time across regions of Finland and find significant differences between the Early and Late Settlement Regions as expected; however, our results indicate more continuous gene flow than previously indicated as Finns migrated towards the northernmost Lapland region. Lastly, we show that haplotype sharing is locally enriched among pairs of individuals sharing rare alleles by an order of magnitude, especially among pairs sharing rare disease causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.

[1]  Zachary A. Szpiech,et al.  Long runs of homozygosity are enriched for deleterious variation. , 2013, American journal of human genetics.

[2]  Matthew Stephens,et al.  Visualizing spatial population structure with estimated effective migration surfaces , 2014, Nature Genetics.

[3]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[4]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[5]  Itsik Pe'er,et al.  Evaluating potential for whole-genome studies in Kosrae, an isolated population in Micronesia , 2006, Nature Genetics.

[6]  Pekka Ellonen,et al.  Genetic markers and population history: Finland revisited , 2009, European Journal of Human Genetics.

[7]  D. MacArthur,et al.  Loss-of-function variants in the genomes of healthy humans. , 2010, Human molecular genetics.

[8]  Sara M. Willems,et al.  The impact of low-frequency and rare variants on lipid levels , 2015, Nature Genetics.

[9]  I. Pe’er,et al.  Length distributions of identity by descent reveal fine-scale demographic history. , 2012, American journal of human genetics.

[10]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[11]  Brian L Browning,et al.  Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent. , 2015, American journal of human genetics.

[12]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[13]  V. Sousa,et al.  Understanding the origin of species with genome-scale data: modelling gene flow , 2013, Nature Reviews Genetics.

[14]  T Varilo,et al.  Molecular genetics of the Finnish disease heritage. , 1999, Human molecular genetics.

[15]  M. Jarvelin,et al.  Deletion of TOP3β, a component of FMRP-containing mRNPs, contributes to neurodevelopmental disorders , 2013, Nature Neuroscience.

[16]  Kenneth Lange,et al.  Use of population isolates for mapping complex traits , 2000, Nature Reviews Genetics.

[17]  M. Pirinen,et al.  The fine-scale genetic structure of the British population , 2015, Nature.

[18]  Peter L. Ralph,et al.  The Geography of Recent Genetic Ancestry across Europe , 2012, PLoS biology.

[19]  Katja Borodulin,et al.  Forty-year trends in cardiovascular risk factors in Finland. , 2015, European journal of public health.

[20]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[21]  Stephan J Sanders,et al.  Intellectual disability is associated with increased runs of homozygosity in simplex autism. , 2013, American journal of human genetics.

[22]  Stefan Schreiber,et al.  Genome-Wide Analysis of Single Nucleotide Polymorphisms Uncovers Population Structure in Northern Europe , 2008, PloS one.

[23]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[24]  Andrew Collins,et al.  The genome-wide patterns of variation expose significant substructure in a founder population. , 2008, American journal of human genetics.

[25]  Beryl B. Cummings,et al.  A protein-truncating R179X variant in RNF186 confers protection against ulcerative colitis , 2015, Nature Communications.

[26]  L. Peltonen,et al.  Dual origins of Finns revealed by Y chromosome haplotype variation. , 1998, American journal of human genetics.

[27]  Maria Cerezo,et al.  Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences , 2016, Nature Genetics.

[28]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[29]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[30]  Gad Abraham,et al.  Fast Principal Component Analysis of Large-Scale Genome-Wide Data , 2014 .

[31]  D. Falush,et al.  Inference of Population Structure using Dense Haplotype Data , 2012, PLoS genetics.

[32]  Sohini Ramachandran,et al.  Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  K. Lohmueller The Impact of Population Demography and Selection on the Genetic Architecture of Complex Traits , 2013, PLoS genetics.

[34]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[35]  Laura J. Scott,et al.  Directional dominance on stature and cognition in diverse human populations , 2015, Nature.

[36]  Shane A. McCarthy,et al.  Reference-based phasing using the Haplotype Reference Consortium panel , 2016, Nature Genetics.

[37]  A. Konradi,et al.  Seventy years after the siege of Leningrad: does early life famine still affect cardiovascular risk and aging? , 2015, Journal of hypertension.

[38]  V. Salomaa,et al.  Prevalence and clinical correlates of familial hypercholesterolemia founder mutations in the general population. , 2015, Atherosclerosis.

[39]  H. Vézina,et al.  New estimates of intergenerational time intervals for the calculation of age and origins of mutations. , 2000, American journal of human genetics.

[40]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[41]  D. Altshuler,et al.  Simulation of Finnish population history, guided by empirical genetic data, to assess power of rare-variant tests in Finland. , 2014, American journal of human genetics.

[42]  D. I. Boomsma,et al.  MixFit: Methodology for Computing Ancestry-Related Genetic Scores at the Individual Level and Its Application to the Estonian and Finnish Population Studies , 2017, PloS one.

[43]  Sharon R. Browning,et al.  Detecting Rare Variant Associations by Identity-by-Descent Mapping in Case-Control Studies , 2012, Genetics.

[44]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[45]  Elina Salmela,et al.  Genetic structure in Finland and Sweden : aspects of population history and gene mapping , 2012 .

[46]  L. Groop,et al.  Excess maternal transmission of variants in the THADA gene to offspring with type 2 diabetes , 2016, Diabetologia.

[47]  Brian L Browning,et al.  Detecting identity by descent and estimating genotype error rates in sequence data. , 2013, American journal of human genetics.

[48]  Matthew D. Rasmussen,et al.  Genome-Wide Inference of Ancestral Recombination Graphs , 2013, PLoS genetics.

[49]  Matti Pirinen,et al.  Fine-Scale Genetic Structure in Finland , 2017, G3: Genes, Genomes, Genetics.

[50]  Janina M. Jeff,et al.  Genetic identification of a common collagen disease in Puerto Ricans via identity-by-descent mapping in a health system , 2017, bioRxiv.

[51]  R. Mägi,et al.  Genetic Structure of Europeans: A View from the North–East , 2009, PloS one.

[52]  Adam Kiezun,et al.  Deleterious Alleles in the Human Genome Are on Average Younger Than Neutral Alleles of the Same Frequency , 2013, PLoS genetics.

[53]  M. Tallavaara,et al.  Prehistoric population history in eastern Fennoscandia , 2010 .

[54]  Andres Metspalu,et al.  Distribution and Medical Impact of Loss-of-Function Variants in the Finnish Founder Population , 2014, PLoS genetics.

[55]  S. Pääbo,et al.  Paternal and maternal DNA lineages reveal a bottleneck in the founding of the Finnish population. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Cristina E. Valdiosera,et al.  The ancestry and affiliations of Kennewick Man , 2015, Nature.

[57]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[58]  Laurent Excoffier,et al.  Distance from sub-Saharan Africa predicts mutational load in diverse human genomes , 2015, Proceedings of the National Academy of Sciences.