Large scale statistical inference of signaling pathways from RNAi and microarray data

BackgroundThe advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway.ResultsIn this paper we address this challenging problem by extending previous work by Markowetz et al., who proposed a statistical framework to score networks hypotheses in a Bayesian manner. Our extensions go in three directions: First, we introduce a way to omit the data discretization step needed in the original framework via a calculation based on p-values instead. Second, we show how prior assumptions on the network structure can be incorporated into the scoring scheme using regularization techniques. Third and most important, we propose methods to scale up the original approach, which is limited to around 5 genes, to large scale networks.ConclusionComparisons of these methods on artificial data are conducted. Our proposed module network is employed to infer the signaling network between 13 genes in the ER-α pathway in human MCF-7 breast cancer cells. Using a bootstrapping approach this reconstruction can be found with good statistical stability.The code for the module network inference method is available in the latest version of the R-package nem, which can be obtained from the Bioconductor homepage.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  Claude J. P. Bélisle Convergence theorems for a class of simulated annealing algorithms on ℝd , 1992 .

[5]  N. Perrimon,et al.  Sequential activation of signaling pathways during innate immune responses in Drosophila. , 2002, Developmental cell.

[6]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[7]  Holger Fröhlich,et al.  Estimating large-scale signaling networks through nested effect models with intervention effects from microarray data , 2008, Bioinform..

[8]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[9]  Holger Fröhlich,et al.  Estimating Large Scale Scale Signaling Networks through Nested Effects Models from Intervention Effects in Microarray Data , 2007, German Conference on Bioinformatics.

[10]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[11]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Jan van Leeuwen,et al.  Maintenance of Transitive Closures and Transitive Reductions of Graphs , 1987, WG.

[15]  Weichung Joe Shih,et al.  A mixture model for estimating the local false discovery rate in DNA microarray analysis , 2004, Bioinform..

[16]  Rainer Spang,et al.  Non-transcriptional pathway features reconstructed from secondary effects of RNA interference , 2005, Bioinform..

[17]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[18]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[19]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[20]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[21]  Prospero C. Naval,et al.  Parameter estimation using Simulated Annealing for S-system models of biochemical networks , 2007, Bioinform..