The task of predicting in a protein-protein-interaction (PPI) network which proteins are involved in certain diseases, such as cancer, has received a significant amount of attention in the literature [1, 4]. Multiple approaches haven been proposed, some based on graph algorithms, some on standard machine learning approaches. Machine learning approaches such as Milenkovic et al.[5], Furney et al. [1], Li et al. [4], Furney et al. [2] and Kar et al. [3] typically use a featurebased representation of proteins as input, and their success depends strongly on the relevance of the selected features. In earlier work it has been shown that the Gene Ontology (GO) annotations of a protein have high relevance. For instance, Li et al. [4] found predictive performance to depend only slightly on the chosen machine learning method, but strongly on the chosen features, and among many features considered, GO annotations turned out to be particularly important. In previous work, when a protein p is to be classified as disease-related or not, the GO annotations used for that prediction are usually those of p itself. In this paper, we present a new type of GO-based features. These features are based not on the GO annotation (“function”) of a single protein, but on pairs of functions that occur on both sides of an edge in the PPI network. We call them interaction-based features.
[1]
Ozlem Keskin,et al.
Human Cancer Protein-Protein Interaction Network: A Structural Perspective
,
2009,
PLoS Comput. Biol..
[2]
Christos A Ouzounis,et al.
Structural and functional properties of genes involved in human cancer
,
2006,
BMC Genomics.
[3]
Dipanwita Roy Chowdhury,et al.
Human protein reference database as a discovery resource for proteomics
,
2004,
Nucleic Acids Res..
[4]
David P. Davis,et al.
Discovering cancer genes by integrating network and functional properties
,
2009,
BMC Medical Genomics.
[5]
T. Milenković,et al.
Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data
,
2010,
Journal of The Royal Society Interface.
[6]
Mike Tyers,et al.
BioGRID: a general repository for interaction datasets
,
2005,
Nucleic Acids Res..
[7]
J. A. Lozano,et al.
Prioritization of candidate cancer genes—an aid to oncogenomic studies
,
2008,
Nucleic acids research.
[8]
P. Radivojac,et al.
An integrated approach to inferring gene–disease associations in humans
,
2008,
Proteins.