Association analysis-based transformations for protein interaction networks: a function prediction case study

Protein interaction networks are one of the most promising types of biological data for the discovery of functional modules and the prediction of individual protein functions. However, it is known that these networks are both incomplete and inaccurate, i.e., they have spurious edges and lackbiologically valid edges. One way to handle this problem is by transforming the original interaction graph into new graphs that remove spurious edges, add biologically valid ones, and assign reliability scores to the edges constituting the final network. We investigate currently existing methods, as well as propose a robust association analysis-based method for this task. This method is based on the concept of h-confidence, which is a measure that can be used to extract groups of objects having high similarity with each other. Experimental evaluation on several protein interaction data sets show that hyperclique-based transformations enhance the performance of standard function prediction algorithms significantly, and thus have merit.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[3]  Aidong Zhang,et al.  A topological measurement for weighted protein interaction network , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[4]  Hui Xiong,et al.  Identification of Functional Modules in Protein Complexes via Hyperclique Pattern Discovery , 2004, Pacific Symposium on Biocomputing.

[5]  Gaurav Pandey,et al.  Comparative Study of Various Genomic Data Sets for Protein Function Prediction and Enhancements Using Association Analysis ∗ † , 2007 .

[6]  Golan Yona,et al.  Gene expression Effective similarity measures for expression profiles , 2006 .

[7]  Shoudan Liang,et al.  Redundancies in Large-scale Protein Interaction Networks , 2003 .

[8]  Hui Xiong,et al.  Generalizing the notion of support , 2004, KDD.

[9]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[11]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[12]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[15]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[17]  Hui Xiong,et al.  Enhancing data analysis with noise removal , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  M. Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[19]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[20]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[21]  D. Eisenberg,et al.  Protein interaction databases. , 2001, Current opinion in biotechnology.

[22]  Hui Xiong,et al.  Hyperclique pattern discovery , 2006, Data Mining and Knowledge Discovery.

[23]  B. Séraphin,et al.  The tandem affinity purification (TAP) method: a general procedure of protein complex purification. , 2001, Methods.

[24]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[25]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[26]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[29]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[30]  D. West Introduction to Graph Theory , 1995 .

[31]  J M Gauthier,et al.  Protein--protein interaction maps: a lead towards cellular functions. , 2001, Trends in genetics : TIG.

[32]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[33]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[34]  Aidong Zhang,et al.  Prediction of Protein Function Using Common-Neighbors in Protein-Protein Interaction Networks , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[35]  R. Jarvis,et al.  ClusteringUsing a Similarity Measure Based on SharedNear Neighbors , 1973 .

[36]  Mike Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[37]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[38]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[39]  D. Eisenberg,et al.  Computational methods of analysis of protein-protein interactions. , 2003, Current opinion in structural biology.

[40]  Gaurav Pandey,et al.  Computational Approaches for Protein Function Prediction : A Survey , 2006 .

[41]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[42]  Yishan Jiao,et al.  Faster and more accurate global protein function assignment from protein interaction networks using the MFGO algorithm , 2006, FEBS letters.

[43]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.