Gene coexpression network discovery with controlled statistical and biological significance

Many biological functions are executed as a module of coexpressed genes which can be conveniently viewed as a coexpression network. Genes are network vertices and significant pairwise coexpression are network edges. Traditional network discovery methods control either statistical significance or biological significance, but not both. We have designed and implemented a two-stage algorithm that controls both the statistical significance (false discovery rate, FDR) and the biological significance (minimum acceptable strength, MAS) of the discovered network. Based on the estimation of pairwise gene profile correlation, the algorithm provides an initial network discovery that controls only FDR, which is then followed by a second network discovery which controls both FDR and MAS. We illustrate the algorithm for discovery of coexpression networks for yeast galactose metabolism with controlled FDR and MAS.

[1]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[2]  Carolyn Pillers Dobler,et al.  Mathematical Statistics , 2002 .

[3]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[4]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[6]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[7]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Alfred O. Hero,et al.  Multicriteria Gene Screening for Analysis of Differential Expression with DNA Microarrays , 2004, EURASIP J. Adv. Signal Process..

[10]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[11]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[12]  C. Hollenberg,et al.  Concurrent knock‐out of at least 20 transporter genes is required to block uptake of hexoses in Saccharomyces cerevisiae , 1999, FEBS letters.

[13]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .