A genome-wide scan statistic framework for whole-genome sequence data analysis

The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing. Here we propose a scan statistic framework, WGScan, to simultaneously detect the existence, and estimate the locations of association signals at genome-wide scale. WGScan can analytically estimate the significance threshold for a whole-genome scan; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes.Whole-genome sequencing data reveals a large number of variants for testing their associations with phenotypic traits and diseases. Here, the authors develop WGScan, a statistical method for detecting the existence and estimating the locations of the association signal at genome-wide scale.

[1]  Stephan J Sanders,et al.  The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment , 2015, Nature Communications.

[2]  Tanya M. Teslovich,et al.  The Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric Traits , 2012, PLoS genetics.

[3]  Stuart Coles Basics of Statistical Modeling , 2001 .

[4]  J. Kleinman,et al.  Spatiotemporal transcriptome of the human brain , 2011, Nature.

[5]  S. Coles,et al.  An Introduction to Statistical Modeling of Extreme Values , 2001 .

[6]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[7]  Iuliana Ionita-Laza,et al.  Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data. , 2017, American journal of human genetics.

[8]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[9]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[10]  Deciphering Developmental Disorders Study,et al.  Prevalence and architecture of de novo mutations in developmental disorders , 2017, Nature.

[11]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[12]  Alejandro Sifrim,et al.  Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data , 2015, The Lancet.

[13]  Manolis Kellis,et al.  CHD8 regulates neurodevelopmental pathways associated with autism spectrum disorder in neural progenitors , 2014, Proceedings of the National Academy of Sciences.

[14]  S. Grant,et al.  Characterization of the proteome, diseases and evolution of the human postsynaptic density , 2011, Nature Neuroscience.

[15]  Seunggeun Lee,et al.  General framework for meta-analysis of rare variants in sequencing association studies. , 2013, American journal of human genetics.

[16]  Wei Niu,et al.  Coexpression Networks Implicate Human Midfetal Deep Cortical Projection Neurons in the Pathogenesis of Autism , 2013, Cell.

[17]  Lilia M. Iakoucheva,et al.  Paternally inherited cis-regulatory structural variants are associated with autism , 2018, Science.

[18]  Iuliana Ionita-Laza,et al.  Scan-statistic approach identifies clusters of rare disease variants in LRP2, a gene linked and associated with autism spectrum disorders, in three datasets. , 2012, American journal of human genetics.

[19]  C. Lord,et al.  The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors , 2010, Neuron.

[20]  Ryan L. Collins,et al.  An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder , 2018, Nature Genetics.

[21]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[22]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[23]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[24]  Jun Xie,et al.  Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures , 2018, Journal of the American Statistical Association.

[25]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[26]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[27]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[28]  Hongzhe Li,et al.  Simultaneous Discovery of Rare and Common Segment Variants. , 2013, Biometrika.

[29]  Ryan L. Collins,et al.  Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder , 2018, Science.

[30]  Kevin Y. Yip,et al.  Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines , 2017, Nature Genetics.

[31]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[32]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[33]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[34]  Andrew Carroll,et al.  Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology , 2017, Nature Genetics.

[35]  K. Rawlik,et al.  An atlas of genetic associations in UK Biobank , 2017, Nature Genetics.

[36]  Christopher S. Poultney,et al.  Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci , 2015, Neuron.

[37]  Len A. Pennacchio,et al.  Genomic Patterns of De Novo Mutation in Simplex Autism , 2017, Cell.

[38]  Vladimir Makarov,et al.  Scan statistic-based analysis of exome sequencing data identifies FAN1 at 15q13.3 as a susceptibility gene for schizophrenia and autism , 2013, Proceedings of the National Academy of Sciences.

[39]  M. Fornage,et al.  Whole genome sequence analyses of brain imaging measures in the Framingham Study , 2018, Neurology.

[40]  Xiaoming Huo,et al.  Near-optimal detection of geometric objects by fast multiscale methods , 2005, IEEE Transactions on Information Theory.

[41]  Uwe Ohler,et al.  FMR1 targets distinct mRNA sequence elements to regulate protein expression , 2012, Nature.

[42]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[43]  Kai Wang,et al.  A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs , 2018, Nature Communications.

[44]  Navin Rustagi,et al.  Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits. , 2017, American journal of human genetics.

[45]  Iuliana Ionita-Laza,et al.  Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies , 2015, Biometrics.

[46]  Xihong Lin,et al.  Simultaneous Detection of Signal Regions With Applications in Genome-Wide Association Studies , 2017, 1710.05021.

[47]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[48]  D. Licatalosi,et al.  FMRP Stalls Ribosomal Translocation on mRNAs Linked to Synaptic Function and Autism , 2011, Cell.

[49]  Caroline F. Wright,et al.  De novo mutations in regulatory elements in neurodevelopmental disorders , 2018, Nature.