Effects of functional bias on supervised learning of a gene network model.

Gene networks have proven to be an effective approach for modeling cellular systems, capable of capturing some of the extreme complexity of cells in a formal theoretical framework. Not surprisingly, this complexity, combined with our still-limited amount of experimental data measuring the genes and their interactions, makes the reconstruction of gene networks difficult. One powerful strategy has been to analyze functional genomics data using supervised learning of network relationships based upon reference examples from our current knowledge. However, this reliance on the set of reference examples for the supervised learning can introduce major pitfalls, with misleading reference sets resulting in suboptimal learning. There are three requirements for an effective reference set: comprehensiveness, reliability, and freedom from bias. Perhaps not too surprisingly, our current knowledge about gene function is highly biased toward several specific biological functions, such as protein synthesis. This functional bias in the reference set, especially combined with the corresponding functional bias in data sets, induces biased learning that can, in turn, lead to false positive biological discoveries, as we show here for the yeast Saccharomyces cerevisiae. This suggests that careful use of current knowledge and genomics data is required for successful gene network modeling using the supervised learning approach. We provide guidance for better use of these data in learning gene networks.

[1]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[3]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[4]  T. Ideker,et al.  Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae , 2006, Journal of biology.

[5]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[6]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[7]  Emal Pasarly Time , 2011, Encyclopedia of Evolutionary Psychological Science.

[8]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[9]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[12]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Weiwei Zhong,et al.  Genome-Wide Prediction of C. elegans Genetic Interactions , 2006, Science.

[15]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[16]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[17]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.