Identification of core-attachment complexes based on maximal frequent patterns in protein-protein interaction networks

In this paper, we present a method based on mining maximal frequent patterns for core-attachment complexes identification in yeast protein-protein interaction networks (PINs). Our method contains of two stages. Firstly, it finds all the protein-complex cores by mining maximal frequent patterns in PIN using FP-growth method. Then it filters the redundant cores and adds the attachment proteins for each remained core to form protein complexes. We experimentally evaluate the performance of our method using three different yeast PINs. The results show that our method is better than other existing methods with regard to localization and Gene Ontology (GO) semantic similarity within the predicted complexes. Furthermore, the accuracy of prediction with regard to the known CYC2008 reference complexes proves that our results can obtain higher map complex rate.

[1]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[2]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[3]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[4]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[5]  Bart Goethals,et al.  Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations , 2005, KDD 2005.

[6]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[8]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[9]  See-Kiong Ng,et al.  Discovering protein complexes in dense reliable neighborhoods of protein interaction networks. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[10]  M. Tyers,et al.  Still Stratus Not Altocumulus: Further Evidence against the Date/Party Hub Distinction , 2007, PLoS biology.

[11]  Philip S. Yu,et al.  An Efficient Online Tool to Search Top-N Genes with Similar Biological Functions in Gene Ontology Database , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[12]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[13]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[14]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[15]  Lee Aaron Newberg,et al.  Exact Calculation of Distributions on Integers, with Application to Sequence Alignment , 2009, J. Comput. Biol..

[16]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[17]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[18]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[19]  Andrew Emili,et al.  Identifying functional modules in the physical interactome of Saccharomyces cerevisiae , 2007, Proteomics.

[20]  Takeaki Uno,et al.  Enumeration of condition-dependent dense modules in protein interaction networks , 2009, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[21]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[22]  James Robert Krycer,et al.  Are protein complexes made of cores, modules and attachments? , 2008, Proteomics.

[23]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[24]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[25]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[26]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[27]  Caroline C. Friedel,et al.  ProCope - protein complex prediction and evaluation , 2008, Bioinform..

[28]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[29]  Philip S. Yu,et al.  G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery , 2009, Nucleic Acids Res..

[30]  Hisashi Kashima,et al.  Protein complex prediction via verifying and reconstructing the topology of domain-domain interactions , 2010, BMC Bioinformatics.

[31]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[32]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[33]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[34]  Ricardo Martínez,et al.  GenMiner: Mining Informative Association Rules from Genomic Data , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[35]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, J. Comput. Biol..

[36]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[37]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[38]  H. Ge,et al.  UPA, a universal protein array system for quantitative detection of protein-protein, protein-DNA, protein-RNA and protein-ligand interactions. , 2000, Nucleic acids research.

[39]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[40]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[41]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..