NETWORK ASSISTED ANALYSIS TO REVEAL THE GENETIC BASIS OF AUTISM.

While studies show that autism is highly heritable, the nature of the genetic basis of this disorder remains illusive. Based on the idea that highly correlated genes are functionally interrelated and more likely to affect risk, we develop a novel statistical tool to find more potentially autism risk genes by combining the genetic association scores with gene co-expression in specific brain regions and periods of development. The gene dependence network is estimated using a novel partial neighborhood selection (PNS) algorithm, where node specific properties are incorporated into network estimation for improved statistical and computational efficiency. Then we adopt a hidden Markov random field (HMRF) model to combine the estimated network and the genetic association scores in a systematic manner. The proposed modeling framework can be naturally extended to incorporate additional structural information concerning the dependence between genes. Using currently available genetic association data from whole exome sequencing studies and brain gene expression levels, the proposed algorithm successfully identified 333 genes that plausibly affect autism risk.

[1]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[2]  Kenny Q. Ye,et al.  De Novo Gene Disruptions in Children on the Autistic Spectrum , 2012, Neuron.

[3]  Feng Luo,et al.  Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory , 2007, BMC Bioinformatics.

[4]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[5]  E. Ben-David,et al.  Combined analysis of exome sequencing points toward a major role for transcription regulation during brain development in autism , 2013, Molecular Psychiatry.

[6]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[7]  Søren Brunak,et al.  MetaRanker 2.0: a web server for prioritization of genetic variation data , 2013, Nucleic Acids Res..

[8]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[9]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[10]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[11]  Kathryn Roeder,et al.  Most genetic risk for autism resides with common variation , 2014, Nature Genetics.

[12]  D. Licatalosi,et al.  FMRP Stalls Ribosomal Translocation on mRNAs Linked to Synaptic Function and Autism , 2011, Cell.

[13]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[14]  Kathryn Roeder,et al.  Analysis of Rare, Exonic Variation amongst Subjects with Autism Spectrum Disorders and Population Controls , 2013, PLoS genetics.

[15]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[16]  Avi Ma'ayan,et al.  ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments , 2010, Bioinform..

[17]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[18]  Michael F. Walker,et al.  De novo mutations revealed by whole-exome sequencing are strongly associated with autism , 2012, Nature.

[19]  Jay Shendure,et al.  Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations , 2012, Nature Genetics.

[20]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[21]  Shuang Li,et al.  Bootstrap Inference for Network Construction , 2011 .

[22]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[23]  Christian L. Müller,et al.  Don't Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX , 2014, AAAI.

[24]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[25]  Jie Peng,et al.  BOOTSTRAP INFERENCE FOR NETWORK CONSTRUCTION WITH AN APPLICATION TO A BREAST CANCER MICROARRAY STUDY. , 2011, The annals of applied statistics.

[26]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[27]  Kathryn Roeder,et al.  Integrated Model of De Novo and Inherited Genetic Variants Yields Greater Power to Identify Risk Genes , 2013, PLoS genetics.

[28]  S. Steinberg,et al.  Rate of de novo mutations, father’s age, and disease risk , 2012, Nature.

[29]  Kenneth Rice,et al.  FDR and Bayesian Multiple Comparisons Rules , 2006 .

[30]  Atul J. Butte,et al.  Unsupervised knowledge discovery in medical databases using relevance networks , 1999, AMIA.

[31]  S. Horvath,et al.  Integrative Functional Genomic Analyses Implicate Specific Molecular Pathways and Circuits in Autism , 2013, Cell.

[32]  M. Daly,et al.  Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology , 2011, PLoS genetics.

[33]  Christopher S. Poultney,et al.  Synaptic, transcriptional, and chromatin genes disrupted in autism , 2014, Nature.

[34]  C. Betancur,et al.  Etiological heterogeneity in autism spectrum disorders: More than 100 genetic and genomic disorders and still counting , 2011, Brain Research.

[35]  Johannes Lederer,et al.  Topology Adaptive Graph Estimation in High Dimensions , 2014, Mathematics.

[36]  K. Roeder,et al.  The Autism Sequencing Consortium: Large-Scale, High-Throughput Sequencing in Autism Spectrum Disorders , 2012, Neuron.

[37]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[38]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[39]  Julien Mairal,et al.  Supervised feature selection in graphs with path coding penalties and network flows , 2012, J. Mach. Learn. Res..

[40]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[41]  Hongzhe Li,et al.  A hidden Markov random field model for genome-wide association studies. , 2010, Biostatistics.

[42]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.

[44]  Harrison H. Zhou,et al.  Estimating Sparse Precision Matrix: Optimal Rates of Convergence and Adaptive Estimation , 2012, 1212.2882.

[45]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[46]  Su-In Lee,et al.  Learning graphical models with hubs , 2014, J. Mach. Learn. Res..

[47]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[48]  Shiqian Ma,et al.  Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection , 2012, Neural Computation.

[49]  Kathryn Roeder,et al.  Common genetic variants, acting additively, are a major source of risk for autism , 2012, Molecular Autism.

[50]  Margaret A. Pericak-Vance,et al.  Individual common variants exert weak effects on the risk for autism spectrum disorders , 2012, Human molecular genetics.

[51]  M. Daly,et al.  Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions , 2009, PLoS genetics.

[52]  J. Kleinman,et al.  Spatiotemporal transcriptome of the human brain , 2011, Nature.

[53]  Wei Niu,et al.  Coexpression Networks Implicate Human Midfetal Deep Cortical Projection Neurons in the Pathogenesis of Autism , 2013, Cell.

[54]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[55]  Kathryn Roeder,et al.  DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics , 2014, Molecular Autism.

[56]  Raya Khanin,et al.  How Scale-Free Are Biological Networks , 2006, J. Comput. Biol..