The Norwegian Mother, Father, and Child cohort study (MoBa) genotyping data resource: MoBaPsychGen pipeline v.1

Background The Norwegian Mother, Father, and Child Cohort Study (MoBa) is a population-based pregnancy cohort, which includes approximately 114,500 children, 95,200 mothers, and 75,200 fathers. Genotyping of MoBa has been conducted through multiple research projects, spanning several years; using varying selection criteria, genotyping arrays, and genotyping centres. MoBa contains numerous interrelated families, which necessitated the implementation of a family-based quality control (QC) pipeline that verifies and accounts for diverse types of relatedness. Methods The MoBaPsychGen pipeline, comprising pre-imputation QC, phasing, imputation, and post-imputation QC, was developed based on current best-practice protocols and implemented to account for the complex structure of the MoBa genotype data. The pipeline includes QC on both single nucleotide polymorphism (SNP) and individual level. Phasing and imputation were performed using the publicly available Haplotype Reference Consortium release 1.1 panel as a reference. Information from the Medical Birth Registry of Norway and MoBa questionnaires were used to identify biological sex, year of birth, reported parent-offspring (PO) relationships, and multiple births (only available in the offspring generation). Results In total, 207,569 unique individuals (90% of the unique individuals included in the study) and 6,981,748 SNPs passed the MoBaPsychGen pipeline. The relatedness checks performed throughout the pipeline allowed identification of within-generation and across-generation first-degree, second-degree, and third-degree relatives. The individuals passing post-imputation QC comprised 64,471 families ranging in size from singletons to 84 unique individuals (singletons are included as families as other family members may not have been genotyped, imputed, or passed post-imputation QC). The relationships identified include 287 monozygotic twin pairs, 22,884 full siblings, 117,004 PO pairs, 23,299 second-degree relative pairs, and 10,828 third-degree relative pairs. Discussion MoBa contains a highly complex relatedness structure, with a variety of family structures including singletons, PO duos, full (mother, father, child) PO trios, nuclear families, blended families, and extended families. The availability of robustly quality-controlled genetic data for such a large cohort with a unique extended family structure will allow many novel research questions to be addressed. Furthermore, the MoBaPsychGen pipeline has potential utility in similar cohorts.

[1]  Amanda Lee Hughes,et al.  Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects , 2022, Nature Genetics.

[2]  O. Andreassen,et al.  Modeling assortative mating and genetic similarities between partners, siblings, and in-laws , 2022, Nature Communications.

[3]  R. Loos,et al.  15 years of genome-wide association studies and no signs of slowing down , 2020, Nature Communications.

[4]  David M. Evans,et al.  Direct and Indirect Effects of Maternal, Paternal, and Offspring Genotypes: Trio-GCTA , 2020, Behavior Genetics.

[5]  David M. Evans,et al.  Introducing M-GCTA a Software Package to Estimate Maternal (or Paternal) Genetic Effects on Offspring Phenotypes , 2019, Behavior genetics.

[6]  Robert Karlsson,et al.  RICOPILI: Rapid Imputation for COnsortias PIpeLIne , 2019, bioRxiv.

[7]  Alicia R. Martin,et al.  Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder , 2018, Nature Genetics.

[8]  Jonathan A. Busam,et al.  Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure , 2017, BMC Genomics.

[9]  Gad Abraham,et al.  FlashPCA2: principal component analysis of biobank-scale genotype datasets , 2016, bioRxiv.

[10]  P. Magnus,et al.  Cohort Profile Update: The Norwegian Mother and Child Cohort Study (MoBa). , 2016, International journal of epidemiology.

[11]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[12]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[13]  P. Magnus,et al.  The biobank of the Norwegian Mother and Child Cohort Study - present status. , 2014 .

[14]  G. Davey Smith,et al.  Mendelian randomization: genetic anchors for causal inference in epidemiological studies , 2014, Human molecular genetics.

[15]  L. Kiemeney,et al.  A Comparison of Multivariate Genome-Wide Association Methods , 2014, PloS one.

[16]  Ross M. Fraser,et al.  A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness , 2014, PLoS genetics.

[17]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[18]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[19]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[20]  Rongling Li,et al.  Quality Control Procedures for Genome‐Wide Association Studies , 2011, Current protocols in human genetics.

[21]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[22]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[23]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[24]  K. Shianna,et al.  Long-range LD can confound genome scans in admixed populations. , 2008, American journal of human genetics.

[25]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[26]  P. Magnus,et al.  Cohort profile: the Norwegian Mother and Child Cohort Study (MoBa). , 2006, International journal of epidemiology.

[27]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.