Genetic identification of a common collagen disease in Puerto Ricans via identity-by-descent mapping in a health system

Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic BioMe biobank in New York City. By linking to medical records, we discover a locus associated with genetic relatedness that also underlies extreme short stature. We link the gene, COL27A1, with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease.

[1]  P. Gonzalez-Alegre,et al.  Towards precision medicine , 2017 .

[2]  K. Girisha,et al.  Second family provides further evidence for causation of Steel syndrome by biallelic mutations in COL27A1 , 2017, Clinical genetics.

[3]  L. Al-Gazali,et al.  A novel aberrant splice site mutation in COL27A1 is responsible for Steel syndrome and extension of the phenotype to include hearing loss , 2017, American journal of medical genetics. Part A.

[4]  Seunggeun Lee,et al.  A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS , 2017, bioRxiv.

[5]  Marylyn D. Ritchie,et al.  Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study , 2016, Science.

[6]  Marylyn D. Ritchie,et al.  Genetic identification of familial hypercholesterolemia within a single U.S. health care system , 2016, Science.

[7]  N. Katsanis The continuum of causality in human genetic disorders , 2016, Genome Biology.

[8]  Amanda K. Sarata,et al.  The Precision Medicine Initiative , 2016 .

[9]  S. Fullerton,et al.  Genomics is failing on diversity , 2016, Nature.

[10]  Peter Szolovits,et al.  Genetic Misdiagnoses and the Potential for Health Disparities. , 2016, The New England journal of medicine.

[11]  G. Feldman 2016 ACMG Annual Meeting presidential address: the practice of medical genetics: myths and realities , 2016, Genetics in Medicine.

[12]  Gretchen A. Stevens,et al.  A century of trends in adult human height , 2016, eLife.

[13]  D. Goldstein,et al.  Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine , 2016, Genome Biology.

[14]  Robert C. Green,et al.  Erratum: Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium (American Journal of Human Genetics (2016) 98(6) (1067–1076) (S0002929716300593) (10.1016/j.ajhg.2016.03.024)) , 2016 .

[15]  I. Scheffer,et al.  Identity by descent fine mapping of familial adult myoclonus epilepsy (FAME) to 2p11.2–2q11.2 , 2016, Human Genetics.

[16]  Matthew S. Lebo,et al.  Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium. , 2016, American journal of human genetics.

[17]  Nikhil Wagle,et al.  Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine. , 2016, American journal of human genetics.

[18]  William A Gahl,et al.  The NIH Undiagnosed Diseases Program and Network: Applications to modern medicine. , 2016, Molecular genetics and metabolism.

[19]  P. Dayan,et al.  A mathematical model explains saturating axon guidance responses to molecular gradients , 2016, eLife.

[20]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[21]  J. Trowsdale,et al.  KIR haplotypes are associated with late-onset type 1 diabetes in European–American families , 2015, Genes and Immunity.

[22]  Karynne E. Patterson,et al.  Gene discovery for Mendelian conditions via social networking: de novo variants in KDM1A cause developmental delay and distinctive facial features , 2015, Genetics in Medicine.

[23]  Peter L. Ralph,et al.  Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution , 2014, G3: Genes, Genomes, Genetics.

[24]  Matthew S. Lebo,et al.  Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium. , 2016, American journal of human genetics.

[25]  Erin Rooney Riggs,et al.  GenomeConnect: Matchmaking Between Patients, Clinical Laboratories, and Researchers to Improve Genomic Knowledge , 2015, Human mutation.

[26]  Orion J. Buske,et al.  The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery , 2015, Human mutation.

[27]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[28]  Brian L Browning,et al.  Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent. , 2015, American journal of human genetics.

[29]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[30]  Heidi L Rehm,et al.  ClinGen--the Clinical Genome Resource. , 2015, The New England journal of medicine.

[31]  Satoru Miyano,et al.  Global implementation of genomic medicine: We are not alone , 2015, Science Translational Medicine.

[32]  Euan A Ashley,et al.  The precision medicine initiative: a new national effort. , 2015, JAMA.

[33]  M. Shaw,et al.  Identical by descent L1CAM mutation in two apparently unrelated families with intellectual disability without L1 syndrome. , 2015, European journal of medical genetics.

[34]  Davis J. McCarthy,et al.  Factors influencing success of clinical genome sequencing across a broad spectrum of disorders , 2015, Nature Genetics.

[35]  Demetrius J Porche,et al.  Precision Medicine Initiative , 2015, American journal of men's health.

[36]  J. Lupski,et al.  Mutations in COL27A1 cause Steel syndrome and suggest a founder mutation effect in the Puerto Rican population , 2014, European Journal of Human Genetics.

[37]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[38]  Paula Katavolos,et al.  Effect of selective LRRK2 kinase inhibition on nonhuman primate lung , 2015, Science Translational Medicine.

[39]  L. Kunkel,et al.  A slowly progressive form of limb-girdle muscular dystrophy type 2C associated with founder mutation in the SGCG gene in Puerto Rican Hispanics , 2015, Molecular Genetics & Genomic Medicine.

[40]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[41]  Magalie S Leduc,et al.  Molecular findings among patients referred for clinical whole-exome sequencing. , 2014, JAMA.

[42]  J. Roach,et al.  Origin of the PSEN1 E280A mutation causing early-onset Alzheimer's disease , 2014, Alzheimer's & Dementia.

[43]  Markus Scholz,et al.  fcGENE: A Versatile Tool for Processing and Transforming SNP Datasets , 2014, PloS one.

[44]  Xuefeng Wang,et al.  Firth logistic regression for rare variant association tests , 2014, Front. Genet..

[45]  R. Mayeux,et al.  Disease-related mutations among Caribbean Hispanics with familial dementia , 2014, Molecular genetics & genomic medicine.

[46]  P. Zhang,et al.  Identity-by-descent approaches identify regions of importance for genetic susceptibility to hereditary esophageal squamous cell carcinoma. , 2014, Oncology reports.

[47]  H. Ostrer,et al.  Genome-wide mapping of IBD segments in an Ashkenazi PD cohort identifies associated haplotypes. , 2014, Human molecular genetics.

[48]  Eric E Schadt,et al.  Analytical validation of whole exome and whole genome sequencing for clinical applications , 2014, BMC Medical Genomics.

[49]  Anders Albrechtsen,et al.  RelateAdmix: a software tool for estimating relatedness between admixed individuals , 2014, Bioinform..

[50]  Ross M. Fraser,et al.  A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness , 2014, PLoS genetics.

[51]  Zachary A. Szpiech,et al.  selscan: An Efficient Multithreaded Program to Perform EHH-Based Scans for Positive Selection , 2014, Molecular biology and evolution.

[52]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[53]  F. Collins,et al.  Founder Mutation in RSPH4A Identified in Patients of Hispanic Descent with Primary Ciliary Dyskinesia , 2013, Human mutation.

[54]  Charles F. Bearden,et al.  A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk , 2013, Cell.

[55]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[56]  Christopher R. Gignoux,et al.  Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data , 2013, PLoS genetics.

[57]  Jeanette J McCarthy,et al.  Genomic Medicine: A Decade of Successes, Challenges, and Opportunities , 2013, Science Translational Medicine.

[58]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[59]  Jake K. Byrnes,et al.  Reconstructing the Population Genetic History of the Caribbean , 2013, PLoS genetics.

[60]  G. Rappold,et al.  Height matters—from monogenic disorders to normal variation , 2013, Nature Reviews Endocrinology.

[61]  Dan M. Roden,et al.  Implementing genomic medicine in the clinic: the future is here , 2013, Genetics in Medicine.

[62]  Irving E. Vega,et al.  Frequency and clinicopathological characteristics of presenilin 1 Gly206Ala mutation in Puerto Rican Hispanics with dementia. , 2013, Journal of Alzheimer's disease : JAD.

[63]  Brian L Browning,et al.  Identity by descent between distant relatives: detection and applications. , 2012, Annual review of genetics.

[64]  Itsik Pe'er,et al.  Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples , 2012, PloS one.

[65]  Sharon R. Browning,et al.  Detecting Rare Variant Associations by Identity-by-Descent Mapping in Case-Control Studies , 2012, Genetics.

[66]  R. Collins What makes UK Biobank special? , 2012, The Lancet.

[67]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[68]  Alexander Gusev,et al.  The architecture of long-range haplotypes shared within and across populations. , 2012, Molecular biology and evolution.

[69]  R. Boot-Handford,et al.  Collagen XXVII Organises the Pericellular Matrix in the Growth Plate , 2011, PloS one.

[70]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[71]  Alexander Gusev,et al.  DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. , 2011, American journal of human genetics.

[72]  R. Skolasky,et al.  Scoliosis in Adults Aged Forty Years and Older: Prevalence and Relationship to Age, Race, and Gender , 2011, Spine.

[73]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[74]  Roberto R. Ramirez,et al.  Overview of Race and Hispanic Origin: 2010 , 2011 .

[75]  P. Visscher,et al.  From Galton to GWAS: quantitative genetics of human height. , 2010, Genetics research.

[76]  Peter Kraft,et al.  Quality control and quality assurance in genotypic data for genome‐wide association studies , 2010, Genetic epidemiology.

[77]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[78]  John P Elder,et al.  Sample design and cohort selection in the Hispanic Community Health Study/Study of Latinos. , 2010, Annals of epidemiology.

[79]  Peter Kraft,et al.  Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes. , 2010, Human molecular genetics.

[80]  R. Betz,et al.  Steel Syndrome: Dislocated Hips and Radial Heads, Carpal Coalition, Scoliosis, Short Stature, and Characteristic Facial Features , 2010, Journal of pediatric orthopedics.

[81]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[82]  J. M. Pace,et al.  Critical Early Roles for col27a1a and col27a1b in Zebrafish Notochord Morphogenesis, Vertebral Mineralization and Post-embryonic Axial Growth , 2009, PloS one.

[83]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[84]  Alexander Gusev,et al.  Systematic haplotype analysis resolves a complex plasma plant sterol locus on the Micronesian Island of Kosrae , 2009, Proceedings of the National Academy of Sciences.

[85]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[86]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[87]  A. Need,et al.  A genome-wide genetic signature of Jewish ancestry perfectly separates individuals with and without full Jewish ancestry in a large random sample of European Americans , 2009, Genome Biology.

[88]  P. Cohen,et al.  Consensus statement on the diagnosis and treatment of children with idiopathic short stature: a summary of the Growth Hormone Research Society, the Lawson Wilkins Pediatric Endocrine Society, and the European Society for Paediatric Endocrinology Workshop. , 2008, The Journal of clinical endocrinology and metabolism.

[89]  O. Pourquié,et al.  Mutations in the MESP2 gene cause spondylothoracic dysostosis/Jarcho-Levin syndrome. , 2008, American journal of human genetics.

[90]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[91]  R. Mei,et al.  A genomewide admixture mapping panel for Hispanic/Latino populations. , 2007, American journal of human genetics.

[92]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[93]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[94]  P. Byers,et al.  Stability related bias in residues replacing glycines within the collagen triple helix (Gly‐Xaa‐Yaa) in inherited connective tissue disorders , 2004, Human mutation.

[95]  P. Byers,et al.  Identification, characterization and expression analysis of a new fibrillar collagen gene, COL27A1. , 2003, Matrix biology : journal of the International Society for Matrix Biology.

[96]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[97]  J. Reginster,et al.  The prevalence and burden of arthritis. , 2002, Rheumatology.

[98]  S. Bale,et al.  Mutation of a new gene causes a unique form of Hermansky–Pudlak syndrome in a genetic isolate of central Puerto Rico , 2001, Nature Genetics.

[99]  D O Stram,et al.  A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. , 2000, American journal of epidemiology.

[100]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[101]  JoAnn E. Manson,et al.  Design of the Women's Health Initiative clinical trial and observational study. The Women's Health Initiative Study Group. , 1998, Controlled clinical trials.

[102]  H. Kuivaniemi,et al.  A glycine (415)‐to‐serine substitution results in impaired secretion and decreased thermal stability of type III procollagen in a patient with Ehlers‐Danlos syndrome type IV , 1997, Human mutation.

[103]  W. Cole,et al.  A novel G499D substitution in the α1(III) chain of type III collagen produces variable forms of Ehlers‐Danlos syndrome type IV , 1996, Human mutation.

[104]  H. Kuivaniemi,et al.  Substitution of valine for glycine 793 in type III procollagen in Ehlers‐Danlos syndrome type IV , 1995, Human mutation.

[105]  H M Berman,et al.  Crystal and molecular structure of a collagen-like peptide at 1.9 A resolution. , 1994, Science.

[106]  Nelson B. Freimer,et al.  Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis , 1994, Nature Genetics.

[107]  R. Betz,et al.  A syndrome of dislocated hips and radial heads, carpal coalition, and short stature in Puerto Rican children. , 1993, The Journal of bone and joint surgery. American volume.

[108]  C. Antignac,et al.  Substitution of arginine for glycine 325 in the collagen alpha 5 (IV) chain associated with X-linked Alport syndrome: characterization of the mutation by direct sequencing of PCR-amplified lymphoblast cDNA fragments. , 1992, American journal of human genetics.

[109]  K. Tryggvason,et al.  Mutation in the alpha 5(IV) collagen chain in juvenile-onset Alport syndrome without hearing loss or ocular lesions: detection by denaturing gradient gel electrophoresis of a PCR product. , 1992, American journal of human genetics.

[110]  D. Rowe,et al.  An osteopenic nonfracture syndrome with features of mild osteogenesis imperfecta associated with the substitution of a cysteine for glycine at triple helix position 43 in the pro alpha 1(I) chain of type I collagen. , 1992, The Journal of clinical investigation.

[111]  P. Byers,et al.  Osteogenesis imperfecta. The position of substitution for glycine by cysteine in the triple helical domain of the pro alpha 1(I) chains of type I collagen determines the clinical phenotype. , 1989, The Journal of clinical investigation.