caught by machine-learning method

Abstract —Protein–protein interactions (PPIs) are central for most biological processes. Much effort has been put into the development of methodology for predicting PPIs and the construction of PPIs networks. Though a high accurate rate those methods have achieved, it was found that the accurate rate strongly depends on the balance of datasets. Compare to negative datasets, positive datasets contain some proteins which are called hub protein intact more proteins. And the unbalance between datasets leads to an excellent performance of PPIs prediction. But when one used balance datasets, the performance is disappointed. Different Biological functions are supported by different local PPIs network. Does it mean that local PPIs network has its own feature? In this paper, we managed to catch features of local networks in three species on the condition that there is no unbalance between positive dataset and negative dataset. Features of local PPIs network fades as the network extended. The associate rules method is used to analyze features of local PPI network. All the datasets used in this study are derived from public available database. Keywords:protein-protein interaction; interactional feature; machine-learning method; balanced random sampling; associate rules

[1]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[2]  Dmitrij Frishman,et al.  The Negatome database: a reference set of non-interacting protein pairs , 2009, Nucleic Acids Res..

[3]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[4]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[5]  Xiaomei Wu,et al.  Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset , 2008, Nucleic acids research.

[6]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[7]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[8]  Henning Hermjakob,et al.  Mapping Plant Interactomes Using Literature Curated and Predicted Protein–Protein Interaction Data Sets[W] , 2010, Plant Cell.

[9]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[10]  Melanie L. Mayer,et al.  Protein networks—built by association , 2000, Nature Biotechnology.

[11]  Huiru Zheng,et al.  Predictive Integration of Gene Ontology-Driven Similarity and Functional Interactions , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[13]  William Stafford Noble,et al.  Large-scale prediction of protein-protein interactions from structures , 2010, BMC Bioinformatics.

[14]  Doheon Lee,et al.  Modularized learning of genetic interaction networks from biological annotations and mRNA expression data , 2005, Bioinform..

[15]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.