Genotyping and population structure of the China Kadoorie Biobank

China Kadoorie Biobank is a population-based prospective cohort of >512,000 adults recruited in 2004-2008 from 10 geographically diverse regions across China. Detailed data from questionnaire and physical measurements were collected at baseline, with additional measurements at three resurveys involving approximately 5% of surviving participants. Incident disease events are captured through electronic linkage to death and disease registries and to the national health insurance system. Genome-wide genotyping has been conducted for >100,000 participants using custom-designed Axiom(R) arrays. Analysis of these data reveals extensive relatedness within the CKB cohort, signatures of recent consanguinity, and principal component signatures reflecting large-scale population movements from recent Chinese history. In addition to numerous CKB studies of candidate drug targets and disease risk factors, CKB has made substantial contributions to many international genetics consortia. Collected biosamples are now being used for high-throughput 'omics assays which, together with planned whole genome sequencing, will continue to enhance the scientific value of this biobank.

[1]  Aaron F. McDaid,et al.  A Saturated Map of Common Genetic Variants Associated with Human Height from 5.4 Million Individuals of Diverse Ancestries , 2022 .

[2]  Shijie C. Zheng,et al.  Limb development genes underlie variation in human fingerprint patterns , 2022, Cell.

[3]  M. Pirinen,et al.  Leveraging global multi-ancestry meta-analysis in the study of idiopathic pulmonary fibrosis genetics , 2021, medRxiv.

[4]  Christopher D. Brown,et al.  The power of genetic diversity in genome-wide association studies of lipids , 2021, Nature.

[5]  Alicia R. Martin,et al.  Multi-ancestry meta-analysis of asthma identifies novel associations and highlights the value of increased power and diversity , 2021, medRxiv.

[6]  Y. Okada,et al.  Polygenic risk scores for prediction of breast cancer risk in Asian populations , 2021, Genetics in medicine : official journal of the American College of Medical Genetics.

[7]  Zhengming Chen,et al.  Conventional and bi-directional genetic evidence on resting heart rate and cardiometabolic traits. , 2021, The Journal of clinical endocrinology and metabolism.

[8]  Wei Zhou,et al.  Global Biobank Meta-analysis Initiative: powering genetic discovery across human diseases , 2021, medRxiv.

[9]  R. Kessler,et al.  The Genetic Architecture of Depression in Individuals of East Asian Ancestry , 2021, JAMA psychiatry.

[10]  L. Liang,et al.  Epigenome-wide analysis of DNA methylation and coronary heart disease: a nested case-control study , 2021, eLife.

[11]  H. Zheng,et al.  Cohort profile: the Westlake BioBank for Chinese (WBBC) pilot project , 2021, BMJ Open.

[12]  R. Collins,et al.  Association of heart rate and diabetes among 0.5 million adults in the China Kadoorie biobank: Results from observational and Mendelian randomization analyses. , 2021, Nutrition, metabolism, and cardiovascular diseases : NMCD.

[13]  Elizabeth G. Atkinson,et al.  Multi-Ancestry Meta-Analysis yields novel genetic discoveries and ancestry-specific associations , 2021, bioRxiv.

[14]  Zhengming Chen,et al.  Causal effects of gallstone disease on risk of gastrointestinal cancer in Chinese , 2021, British Journal of Cancer.

[15]  L. Liang,et al.  A large-scale genome-wide association analysis of lung function in the Chinese population identifies novel loci and highlights shared genetic aetiology with obesity , 2021, European Respiratory Journal.

[16]  Christopher R. Bauer,et al.  Within-sibship GWAS improve estimates of direct genetic effects , 2021, bioRxiv.

[17]  J. Shendure,et al.  CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores , 2021, Genome Medicine.

[18]  C. Greenwood,et al.  Improved prediction of fracture risk leveraging a genome-wide polygenic risk score , 2021, Genome Medicine.

[19]  Boe,et al.  Genetic insights into biological mechanisms governing human ovarian ageing , 2021, Nature.

[20]  R. Mägi,et al.  The genetic architecture of sporadic and multiple consecutive miscarriage , 2020, Nature Communications.

[21]  C. Sudlow,et al.  Genome-wide association study of intracranial aneurysms identifies 17 risk loci and genetic overlap with clinical risk factors , 2020, Nature Genetics.

[22]  Kyle J. Gaulton,et al.  Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation , 2020, Nature Genetics.

[23]  R. Mägi,et al.  Evaluating the cardiovascular safety of sclerostin inhibition using evidence from meta-analysis of clinical trials and human genetics , 2020, Science Translational Medicine.

[24]  T. Gill,et al.  Genetically Elevated LDL Associates with Lower Risk of Intracerebral Hemorrhage , 2020, Annals of neurology.

[25]  Xiaoming Yang,et al.  Development and Application of IT Systems in Biobank Studies , 2020 .

[26]  Zhengming Chen,et al.  Monitoring Long-Term Health Outcomes of Biobank Participants by Record Linkages , 2020 .

[27]  I. Millwood,et al.  Collection, Processing, and Management of Biological Samples in Biobank Studies , 2020 .

[28]  Alex Hacker,et al.  Management and Curation of Multi-Dimensional Data in Biobank Studies , 2020 .

[29]  Lisa Bastarache,et al.  Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation , 2019, JMIR Medical Informatics.

[30]  John J. McGrath,et al.  Efficient toolkit implementing best practices for principal component analysis of population genetic data , 2019, bioRxiv.

[31]  William J. Astle,et al.  Genome-wide association study of eosinophilic granulomatosis with polyangiitis reveals genomic loci stratified by ANCA status , 2019, Nature Communications.

[32]  May E. Montasser,et al.  Associations of autozygosity with a broad range of human phenotypes , 2019, Nature Communications.

[33]  Hongbing Shen,et al.  Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. , 2019, The Lancet. Respiratory medicine.

[34]  R. Collins,et al.  Vitamin D and cause-specific vascular disease and mortality: a Mendelian randomisation study involving 99,012 Chinese and 106,911 European adults , 2019, BMC Medicine.

[35]  M. McCarthy,et al.  Genetic Predisposition to Type 2 Diabetes and Risk of Subclinical Atherosclerosis and Cardiovascular Diseases Among 160,000 Chinese Adults , 2019, Diabetes.

[36]  Cassandra N. Spracklen,et al.  Identification of type 2 diabetes loci in 433,540 East Asian individuals , 2019, bioRxiv.

[37]  Michael V Holmes,et al.  Conventional and genetic evidence on alcohol and vascular disease aetiology: a prospective study of 500 000 men and women in China , 2019, The Lancet.

[38]  Alicia R. Martin,et al.  Clinical use of current polygenic risk scores may exacerbate health disparities , 2019, Nature Genetics.

[39]  Jun Lv,et al.  Causal associations of blood lipids with risk of ischemic stroke and intracerebral hemorrhage in Chinese adults , 2019, Nature Medicine.

[40]  Brian E. Cade,et al.  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2019, Nature.

[41]  E. Zeggini,et al.  The transferability of lipid loci across African, Asian and European cohorts , 2019, Nature Communications.

[42]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[43]  Benjamin B. Sun,et al.  New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. , 2018, Nature Genetics.

[44]  J. Danesh,et al.  Metabolomic Consequences of Genetic Inhibition of PCSK9 Compared With Statin Treatment , 2018, Circulation.

[45]  Cassandra N. Spracklen,et al.  Interethnic analyses of blood pressure loci in populations of East Asian and European descent , 2018, Nature Communications.

[46]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[47]  J. Shendure,et al.  Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History , 2018, Cell.

[48]  M. McCarthy,et al.  Association of vitamin D with risk of type 2 diabetes: A Mendelian randomisation study in European and Chinese adults , 2018, PLoS medicine.

[49]  E. Green,et al.  Prioritizing diversity in human genomics research , 2017, Nature Reviews Genetics.

[50]  Dermot F. Reilly,et al.  Association of CETP Gene Variants With Risk for Vascular and Nonvascular Diseases Among Chinese Adults , 2017, JAMA cardiology.

[51]  Lars G Fritsche,et al.  Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies , 2017, Nature Genetics.

[52]  Gad Abraham,et al.  FlashPCA2: principal component analysis of biobank-scale genotype datasets , 2016, bioRxiv.

[53]  M. McCarthy,et al.  Bone mineral density and risk of type 2 diabetes and coronary heart disease: A Mendelian randomization study , 2017, Wellcome open research.

[54]  K. van der Velde,et al.  Additional file 4: Table S4. of GAVIN: Gene-Aware Variant INterpretation for medical sequencing , 2017 .

[55]  Christian Gieger,et al.  Genome-wide association analyses for lung function and chronic obstructive pulmonary disease identify new loci and potential druggable targets , 2017, Nature Genetics.

[56]  Birgit Sikkema-Raddatz,et al.  GAVIN: Gene-Aware Variant INterpretation for medical sequencing , 2017, Genome Biology.

[57]  R. Collins,et al.  A phenome-wide association study of a lipoprotein-associated phospholipase A2 loss-of-function variant in 90 000 Chinese adults , 2016, International Journal of Epidemiology.

[58]  L. Wain,et al.  Haplotype estimation for biobank scale datasets , 2016, Nature Genetics.

[59]  Robin G. Walters,et al.  Lipoprotein-Associated Phospholipase A2 Loss-of-Function Variant and Risk of Vascular Diseases in 90,000 Chinese Adults , 2016, Journal of the American College of Cardiology.

[60]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[61]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[62]  B. Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014, Nature Genetics.

[63]  R. Collins,et al.  China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. , 2011, International journal of epidemiology.

[64]  R. Collins,et al.  Cohort profile: the Kadoorie Study of Chronic Disease in China (KSCDC). , 2005, International journal of epidemiology.