Statistical Applications in Genetics and Molecular Biology

An important challenge in statistical genomics concerns integrating experimental data with exogenous information about gene function. A number of statistical methods are available to address this challenge, but most do not accommodate complexities in the functional record. To infer activity of a functional category (e.g., a gene ontology term), most methods use gene-level data on that category, but do not use other functional properties of the same genes. Not doing so creates undue errors in inference. Recent developments in model-based category analysis aim to overcome this difficulty, but in attempting to do so they are faced with serious computational problems. This paper investigates statistical properties and the structure of posterior computation in one such model for the analysis of functional category data. We examine the graphical structures underlying posterior computation in the original parameterization and in a new parameterization aimed at leveraging elements of the model. We characterize identifiability of the underlying activation states, describe a new prior distribution, and introduce approximations that aim to support numerical methods for posterior inference.

[1]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[2]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..

[3]  Endika Bengoetxea,et al.  A parallel framework for loopy belief propagation , 2007, GECCO '07.

[4]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  C. Lawrence,et al.  Centroid estimation in discrete high-dimensional spaces with applications in biology , 2008, Proceedings of the National Academy of Sciences.

[7]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[8]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[9]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[10]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[11]  Eric E Schadt,et al.  Cycle Regulation in Islets with Diabetes Susceptibility a Gene Expression Network Model of Type 2 Diabetes Links Cell P

, 2008 .

[12]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[13]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[14]  Martin Vingron,et al.  Improved detection of overrepresentation of Gene-Ontology annotations with parent-child analysis , 2007, Bioinform..

[15]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[16]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[17]  Francesco C Stingo,et al.  INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES. , 2011, The annals of applied statistics.

[18]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[19]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[20]  I. Simon,et al.  A probabilistic generative model for GO enrichment analysis , 2008, Nucleic acids research.

[21]  D. Nettleton,et al.  A Hidden Markov Model Approach to Testing Multiple Hypotheses on a Tree-Transformed Gene Ontology Graph , 2010 .

[22]  Trevor Hastie,et al.  Averaged gene expressions for regression. , 2007, Biostatistics.

[23]  Søren Højsgaard,et al.  The gRbase Package for Graphical Modelling in R , 2004 .

[24]  M. Newton,et al.  Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis , 2007, 0708.4350.

[25]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Martin Vingron,et al.  Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration , 2008, Bioinform..

[27]  Andrew B. Nobel,et al.  A statistical framework for testing functional categories in microarray data , 2008, 0803.3881.