Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms

Significance Human blood cell production is coordinated to ensure balanced levels of all lineages. The basis of this regulation remains poorly understood. Identification of genetic differences in human populations associated with blood cell measurements can shed light on such regulatory mechanisms. Here, we used whole-genome sequencing data to perform a genetic association study in a population-based biobank from Estonia. We identified a number of potential causal variants and underlying mechanisms. For example, we identified a regulatory element that is necessary for basophil production, which acts specifically during this process to regulate expression of the transcription factor CEBPA. We demonstrate how genome sequencing, genetic fine-mapping, and functional data can be integrated to gain important insight into blood cell production. Genetic variants affecting hematopoiesis can influence commonly measured blood cell traits. To identify factors that affect hematopoiesis, we performed association studies for blood cell traits in the population-based Estonian Biobank using high-coverage whole-genome sequencing (WGS) in 2,284 samples and SNP genotyping in an additional 14,904 samples. Using up to 7,134 samples with available phenotype data, our analyses identified 17 associations across 14 blood cell traits. Integration of WGS-based fine-mapping and complementary epigenomic datasets provided evidence for causal mechanisms at several loci, including at a previously undiscovered basophil count-associated locus near the master hematopoietic transcription factor CEBPA. The fine-mapped variant at this basophil count association near CEBPA overlapped an enhancer active in common myeloid progenitors and influenced its activity. In situ perturbation of this enhancer by CRISPR/Cas9 mutagenesis in hematopoietic stem and progenitor cells demonstrated that it is necessary for and specifically regulates CEBPA expression during basophil differentiation. We additionally identified basophil count-associated variation at another more pleiotropic myeloid enhancer near GATA2, highlighting regulatory mechanisms for ordered expression of master hematopoietic regulators during lineage specification. Our study illustrates how population-based genetic studies can provide key insights into poorly understood cell differentiation processes of considerable physiologic relevance.

[1]  Andres Metspalu,et al.  Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores , 2016, Genetics in Medicine.

[2]  P. Mermelstein,et al.  Opposite Effects of mGluR1a and mGluR5 Activation on Nucleus Accumbens Medium Spiny Neuron Dendritic Spine Density , 2016, PloS one.

[3]  Howard Y. Chang,et al.  Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution , 2016, Nature Genetics.

[4]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[5]  A. Regev,et al.  Expression profiling of constitutive mast cells reveals a unique identity within the immune system , 2016, Nature Immunology.

[6]  Jacob C. Ulirsch,et al.  Insight into GATA1 transcriptional activity through interrogation of cis elements disrupted in human erythroid disorders , 2016, Proceedings of the National Academy of Sciences.

[7]  Alice Giustacchini,et al.  Distinct myeloid progenitor differentiation pathways identified through single cell RNA sequencing , 2016, Nature Immunology.

[8]  Jacob C. Ulirsch,et al.  Advances in understanding erythropoiesis: evolving perspectives , 2016, British journal of haematology.

[9]  T. Esko,et al.  Variants near CHRNA3/5 and APOE have age- and sex-related effects on human lifespan , 2016, Nature Communications.

[10]  B. Göttgens,et al.  Integrated genome-scale analysis of the transcriptional regulatory landscape in a blood stem/progenitor cell model. , 2016, Blood.

[11]  Dana C. Crawford,et al.  Unravelling the human genome–phenome relationship using phenome-wide association studies , 2016, Nature Reviews Genetics.

[12]  Brian L Browning,et al.  Genotype Imputation with Millions of Reference Samples. , 2016, American journal of human genetics.

[13]  A. Friedman,et al.  In Vivo Deletion of the Cebpa +37 kb Enhancer Markedly Reduces Cebpa mRNA in Myeloid Progenitors but Not in Non-Hematopoietic Tissues to Impair Granulopoiesis , 2015, PloS one.

[14]  Vladimir B. Bajic,et al.  HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models , 2015, Nucleic Acids Res..

[15]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[16]  M. Rehli,et al.  An autonomous CEBPA enhancer specific for myeloid-lineage priming and neutrophilic differentiation. , 2016, Blood.

[17]  Andrew D. Johnson,et al.  Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis. , 2016, American journal of human genetics.

[18]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[19]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[20]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[21]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[22]  G. Neale,et al.  Low-level GATA2 overexpression promotes myeloid progenitor self-renewal and blocks lymphoid differentiation in mice. , 2015, Experimental hematology.

[23]  Tom R. Gaunt,et al.  Copy number variations and cognitive phenotypes in unselected populations. , 2015, JAMA.

[24]  Yu-Han H. Hsu,et al.  Genome-wide Analysis of Body Proportion Classifies Height-Associated Variants by Mechanism of Action and Implicates Genes Important for Skeletal Development. , 2015, American journal of human genetics.

[25]  Gregory A. Poland,et al.  Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics , 2015, Genetics.

[26]  Hua Huang,et al.  The STAT5–GATA2 Pathway Is Critical in Basophil and Mast Cell Differentiation and Maintenance , 2015, The Journal of Immunology.

[27]  A. Metspalu,et al.  Linking a Population Biobank with National Health Registries—The Estonian Experience , 2015, Journal of personalized medicine.

[28]  A. Friedman C/EBPα in normal and malignant myelopoiesis , 2015, International Journal of Hematology.

[29]  M. Weiss,et al.  Anemia: progress in molecular mechanisms and therapies , 2015, Nature Medicine.

[30]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[31]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[32]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[33]  D. Karolchik,et al.  The UCSC Genome Browser database: 2016 update , 2015, bioRxiv.

[34]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[35]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[36]  Genotyping and quality control of UK Biobank , a large-scale , extensively phenotyped prospective resource , 2015 .

[37]  Charge Hematology,et al.  Trans-ethnic meta-analysis of white blood cell phenotypes , 2014 .

[38]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[39]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.

[40]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[41]  Buhm Han,et al.  Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci , 2014, bioRxiv.

[42]  Hua Huang,et al.  Mechanisms Controlling Mast Cell and Basophil Lineage Decisions , 2014, Current Allergy and Asthma Reports.

[43]  Neville E. Sanjana,et al.  Improved vectors and genome-wide libraries for CRISPR screening , 2014, Nature Methods.

[44]  Uwe Völker,et al.  Rare and low-frequency coding variants in CXCR2 and other genes are associated with hematological traits , 2014, Nature Genetics.

[45]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[46]  W. V. van IJcken,et al.  HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. , 2014, The Journal of clinical investigation.

[47]  Patricia A Peyser,et al.  Genetic associations with expression for genes implicated in GWAS studies for atherosclerotic cardiovascular disease and blood phenotypes. , 2014, Human molecular genetics.

[48]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[49]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[50]  Yusuke Nakamura,et al.  Trans-ethnic meta-analysis of white blood cell phenotypes. , 2014, Human molecular genetics.

[51]  K. Markianos,et al.  Rare complete loss of function provides insight into a pleiotropic genome-wide association study locus. , 2013, Blood.

[52]  B. Min,et al.  Ikaros limits basophil development by suppressing C/EBP-α expression. , 2013, Blood.

[53]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[54]  R. Matran,et al.  The role of eosinophils and basophils in allergic diseases considering genetic findings , 2013, Current opinion in allergy and clinical immunology.

[55]  Wieslawa I. Mentzen,et al.  Genetic Variants Regulating Immune Cell Levels in Health and Disease , 2013, Cell.

[56]  M. Peters,et al.  Systematic identification of trans eQTLs as putative drivers of known disease associations , 2013, Nature Genetics.

[57]  J. Chabon,et al.  Antagonistic regulation by the transcription factors C/EBPα and MITF specifies basophil and mast cell fates. , 2013, Immunity.

[58]  S. Orkin,et al.  Genome-wide association studies of hematologic phenotypes: a window into human hematopoiesis. , 2013, Current opinion in genetics & development.

[59]  J. Dürig,et al.  Revision of the human hematopoietic tree: granulocyte subtypes derive from distinct hematopoietic lineages. , 2013, Cell reports.

[60]  D. Voehringer Protective and pathological roles of mast cells and basophils , 2013, Nature Reviews Immunology.

[61]  Christian Gieger,et al.  Seventy-five genetic loci influencing the human red blood cell , 2012, Nature.

[62]  Jake K. Byrnes,et al.  Bayesian refinement of association signals for 14 loci in 3 common diseases , 2012, Nature Genetics.

[63]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[64]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[65]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[66]  S. Galli,et al.  Critical role of P1-Runx1 in mouse basophil development. , 2012, Blood.

[67]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[68]  W. Ouwehand,et al.  Silencing of RhoA nucleotide exchange factor, ARHGEF3, reveals its unexpected role in iron uptake. , 2011, Blood.

[69]  Michela Traglia,et al.  TMPRSS6 rs855791 modulates hepcidin transcription in vitro and serum hepcidin levels in normal individuals. , 2011, Blood.

[70]  Yusuke Nakamura,et al.  Identification of Nine Novel Loci Associated with White Blood Cell Subtypes in a Japanese Population , 2011, PLoS genetics.

[71]  Christian Gieger,et al.  Multiple Loci Are Associated with White Blood Cell Phenotypes , 2011, PLoS genetics.

[72]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[73]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[74]  Joel N Hirschhorn,et al.  Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation , 2010, Nature Genetics.

[75]  Tariq Ahmad,et al.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci , 2010, Nature Genetics.

[76]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[77]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[78]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[79]  I. Weissman,et al.  Distinguishing mast cell and granulocyte differentiation at the single-cell level. , 2010, Cell stem cell.

[80]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[81]  D. MacGlashan,et al.  Syk expression in peripheral blood leukocytes, CD34+ progenitors, and CD34‐derived basophils , 2010, Journal of leukocyte biology.

[82]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[83]  T. Enver,et al.  Forcing cells to change lineages , 2009, Nature.

[84]  Johan Van Limbergen,et al.  Common variants at five new loci associated with early-onset inflammatory bowel disease , 2009, Nature Genetics.

[85]  M. Vodyanik,et al.  Generation of mature human myelomonocytic cells through expansion and differentiation of pluripotent stem cell-derived lin-CD34+CD43+CD45+ progenitors. , 2009, The Journal of clinical investigation.

[86]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[87]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[88]  D. Postma,et al.  Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction , 2009, Nature Genetics.

[89]  R. Locksley,et al.  Basophils: a nonredundant contributor to host immunity. , 2009, Immunity.

[90]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[91]  Yusuke Nakamura,et al.  A genome-wide association identified the common genetic variants influence disease severity in β0-thalassemia/hemoglobin E , 2009, Human Genetics.

[92]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[93]  B. Min Basophils: what they 'can do' versus what they 'actually do' , 2008, Nature Immunology.

[94]  J. Schroeder,et al.  Histamine‐releasing factor/translationally controlled tumor protein (HRF/TCTP)‐induced histamine release is enhanced with SHIP‐1 knockdown in cultured human mast cell and basophil models , 2008, Journal of leukocyte biology.

[95]  J. Hirschhorn,et al.  DNA polymorphisms at the BCL11A, HBS1L-MYB, and β-globin loci associate with fetal hemoglobin levels and pain crises in sickle cell disease , 2008, Proceedings of the National Academy of Sciences.

[96]  N. Andrews,et al.  Mutations in TMPRSS6 cause iron-refractory iron deficiency anemia (IRIDA) , 2008, Nature Genetics.

[97]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[98]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[99]  Daniel G Tenen,et al.  The order of expression of transcription factors directs hierarchical specification of hematopoietic lineages. , 2006, Genes & development.

[100]  Asim Khwaja,et al.  The role of Janus kinases in haemopoiesis and haematological malignancy , 2006, British journal of haematology.

[101]  K. Akashi,et al.  Developmental checkpoints of the basophil/mast cell lineages in adult murine hematopoiesis. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[102]  I. Weissman,et al.  Identification of mast cell progenitors in adult mice. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[103]  Pak Chung Sham,et al.  Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits , 2003, Bioinform..

[104]  L. Kanz,et al.  The monoclonal antibody 97A6 defines a novel surface antigen expressed on human basophils and their multipotent and unipotent progenitors. , 1999, Blood.

[105]  J. Oliver,et al.  The identification and characterization of umbilical cord blood‐derived human basophils , 1998, Journal of leukocyte biology.

[106]  Robert V Farese,et al.  A dual thrombin receptor system for platelet activation , 1998, Nature.