Project MinE: study design and pilot analyses of a large-scale whole-genome sequencing study in amyotrophic lateral sclerosis

The most recent genome-wide association study in amyotrophic lateral sclerosis (ALS) demonstrates a disproportionate contribution from low-frequency variants to genetic susceptibility to disease. We have therefore begun Project MinE, an international collaboration that seeks to analyze whole-genome sequence data of at least 15 000 ALS patients and 7500 controls. Here, we report on the design of Project MinE and pilot analyses of successfully sequenced 1169 ALS patients and 608 controls drawn from the Netherlands. As has become characteristic of sequencing studies, we find an abundance of rare genetic variation (minor allele frequency < 0.1%), the vast majority of which is absent in public datasets. Principal component analysis reveals local geographical clustering of these variants within The Netherlands. We use the whole-genome sequence data to explore the implications of poor geographical matching of cases and controls in a sequence-based disease study and to investigate how ancestry-matched, externally sequenced controls can induce false positive associations. Also, we have publicly released genome-wide minor allele counts in cases and controls, as well as results from genic burden tests.

[1]  G. Comi,et al.  A genome-wide association meta-analysis identifies a novel locus at 17q11.2 associated with sporadic amyotrophic lateral sclerosis. , 2014, Human molecular genetics.

[2]  John Q. Trojanowski,et al.  Ataxin-2 intermediate-length polyglutamine expansions are associated with increased risk for ALS , 2010, Nature.

[3]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[4]  A. Al-Chalabi,et al.  An estimate of amyotrophic lateral sclerosis heritability using twin data , 2010, Journal of Neurology, Neurosurgery & Psychiatry.

[5]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[6]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[7]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[8]  Salvador Lucas,et al.  A FACULTY OF MEDICINE. , 1851 .

[9]  Pieter B. T. Neerincx,et al.  Supplementary Information Whole-genome sequence variation , population structure and demographic history of the Dutch population , 2022 .

[10]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[11]  G. Satten,et al.  Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls , 2015, bioRxiv.

[12]  Xun Hu,et al.  TDP-43 Mutations in Familial and Sporadic Amyotrophic Lateral Sclerosis , 2008, Science.

[13]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[14]  B. Browning,et al.  Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data , 2013, Genetics.

[15]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[16]  Alan M. Kwong,et al.  A reference panel of 64,976 haplotypes for genotype imputation , 2015, Nature Genetics.

[17]  Leonard H van den Berg,et al.  Population based epidemiology of amyotrophic lateral sclerosis using capture–recapture methodology , 2011, Journal of Neurology, Neurosurgery & Psychiatry.

[18]  Ewout J N Groen,et al.  Genome-wide association study identifies 19p13.3 (UNC13A) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis , 2009, Nature Genetics.

[19]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[20]  Semyon Kruglyak,et al.  Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms , 2013, Bioinform..

[21]  Matthew C. Kiernan,et al.  Clinical diagnosis and management of amyotrophic lateral sclerosis , 2011, Nature Reviews Neurology.

[22]  David Heckerman,et al.  FaST-LMM-Select for addressing confounding from spatial structure and rare variants , 2013, Nature Genetics.

[23]  Patrick Fuhrmann,et al.  dCache, agile adoption of storage technology , 2012 .

[24]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[25]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[26]  Robert G. Miller,et al.  ALSFRS‐R , 2004, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.

[27]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[28]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[29]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[30]  Xun Hu,et al.  Mutations in FUS, an RNA Processing Protein, Cause Familial Amyotrophic Lateral Sclerosis Type 6 , 2009, Science.

[31]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[32]  M. Swash,et al.  El Escorial revisited: Revised criteria for the diagnosis of amyotrophic lateral sclerosis , 2000, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.

[33]  J. Haines,et al.  Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis , 1993, Nature.

[34]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[35]  David Heckerman,et al.  A Hexanucleotide Repeat Expansion in C9ORF72 Is the Cause of Chromosome 9p21-Linked ALS-FTD , 2011, Neuron.

[36]  Bruce L. Miller,et al.  Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9ORF72 Causes Chromosome 9p-Linked FTD and ALS , 2011, Neuron.

[37]  O. Hardiman,et al.  Rate of familial amyotrophic lateral sclerosis: a systematic review and meta-analysis , 2010, Journal of Neurology, Neurosurgery & Psychiatry.

[38]  Annelot M. Dekker,et al.  Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis , 2017 .

[39]  Zheng-Zheng Tang,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011 .

[40]  Patrick F. Sullivan,et al.  Ultra-rare disruptive and damaging mutations influence educational attainment in the general population , 2016, Nature Neuroscience.