Graph Regularized Transductive Classification on Heterogeneous Information Networks

A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning from such labeled and unlabeled data via transductive classification can lead to good knowledge extraction of the hidden network structure. However, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the transductive classification problem on heterogeneous networked data which share a common topic. Only some objects in the given network are labeled, and we aim to predict labels for all types of the remaining objects. A novel graph-based regularization framework, GNetMine, is proposed to model the link structure in information networks with arbitrary network schema and arbitrary number of object/link types. Specifically, we explicitly respect the type differences by preserving consistency over each relation graph corresponding to each type of links separately. Efficient computational schemes are then introduced to solve the corresponding optimization problem. Experiments on the DBLP data set show that our algorithm significantly improves the classification accuracy over existing state-of-the-art methods.

[1]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[2]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[3]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[4]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[5]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[6]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[7]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[8]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[9]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[10]  Mikhail Belkin,et al.  Manifold Regularization : A Geometric Framework for Learning from Examples , 2004 .

[11]  Jennifer Neville,et al.  Simple estimators for relational Bayesian classifiers , 2003, Third IEEE International Conference on Data Mining.

[12]  Yizhou Sun,et al.  Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models , 2009, NIPS.

[13]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[14]  Rui Li,et al.  Exploring social tagging graph for web object classification , 2009, KDD.

[15]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[16]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[17]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[18]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[19]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[20]  Tong Zhang,et al.  Linear prediction models with graph regularization for web-page categorization , 2006, KDD '06.

[21]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..