Positive and Unlabeled Learning for Graph Classification

The problem of graph classification has drawn much attention in the last decade. Conventional approaches on graph classification focus on mining discriminative sub graph features under supervised settings. The feature selection strategies strictly follow the assumption that both positive and negative graphs exist. However, in many real-world applications, the negative graph examples are not available. In this paper we study the problem of how to select useful sub graph features and perform graph classification based upon only positive and unlabeled graphs. This problem is challenging and different from previous works on PU learning, because there are no predefined features in graph data. Moreover, the sub graph enumeration problem is NP-hard. We need to identify a subset of unlabeled graphs that are most likely to be negative graphs. However, the negative graph selection problem and the sub graph feature selection problem are correlated. Before the reliable negative graphs can be resolved, we need to have a set of useful sub graph features. In order to address this problem, we first derive an evaluation criterion to estimate the dependency between sub graph features and class labels based on a set of estimated negative graphs. In order to build accurate models for the PU learning problem on graph data, we propose an integrated approach to concurrently select the discriminative features and the negative graphs in an iterative manner. Experimental results illustrate the effectiveness and efficiency of the proposed method.

[1]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[2]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[3]  Yong Wang,et al.  Naive Bayes Classifier for Positive Unlabeled Learning with Uncertainty , 2010, SDM.

[4]  Philip S. Yu,et al.  Positive Unlabeled Learning for Data Stream Classification , 2009, SDM.

[5]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[6]  Philip S. Yu,et al.  Semi-supervised feature selection for graph classification , 2010, KDD.

[7]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[8]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Rémi Gilleron,et al.  Text Classification from Positive and Unlabeled Examples , 2002 .

[10]  Wei Wang,et al.  GAIA: graph classification using evolutionary computation , 2010, SIGMOD Conference.

[11]  Hong Cheng,et al.  Identifying bug signatures using discriminative graph mining , 2009, ISSTA.

[12]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[13]  Frann Cois Denis,et al.  PAC Learning from Positive Statistical Queries , 1998, ALT.

[14]  Philip S. Yu,et al.  Near-optimal Supervised Feature Selection among Frequent Subgraphs , 2009, SDM.

[15]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Philip S. Yu,et al.  Multi-label Feature Selection for Graph Classification , 2010, 2010 IEEE International Conference on Data Mining.

[17]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[18]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[20]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[22]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.

[24]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[25]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[26]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[27]  Philip S. Yu,et al.  Dual active feature and sample selection for graph classification , 2011, KDD.