Ranking-based classification of heterogeneous information networks

It has been recently recognized that heterogeneous information networks composed of multiple types of nodes and links are prevalent in the real world. Both classification and ranking of the nodes (or data objects) in such networks are essential for network analysis. However, so far these approaches have generally been performed separately. In this paper, we combine ranking and classification in order to perform more accurate analysis of a heterogeneous information network. Our intuition is that highly ranked objects within a class should play more important roles in classification. On the other hand, class membership information is important for determining a quality ranking over a dataset. We believe it is therefore beneficial to integrate classification and ranking in a simultaneous, mutually enhancing process, and to this end, propose a novel ranking-based iterative classification framework, called RankClass. Specifically, we build a graph-based ranking model to iteratively compute the ranking distribution of the objects within each class. At each iteration, according to the current ranking results, the graph structure used in the ranking algorithm is adjusted so that the sub-network corresponding to the specific class is emphasized, while the rest of the network is weakened. As our experiments show, integrating ranking with classification not only generates more accurate classes than the state-of-art classification methods on networked data, but also provides meaningful ranking of objects within each class, serving as a more informative view of the data than traditional classification.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Tie-Yan Liu,et al.  A general markov framework for page importance computation , 2009, CIKM.

[3]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[4]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[5]  Yizhou Sun,et al.  Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models , 2009, NIPS.

[6]  Qiang Yang,et al.  Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains , 2010, ICML.

[7]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[8]  Dacheng Tao,et al.  Manifold Regularization for SIR with Rate Root-n Convergence , 2009, NIPS.

[9]  Jieping Ye,et al.  Hypergraph spectral learning for multi-label classification , 2008, KDD.

[10]  Jeff A. Bilmes,et al.  Label Selection on Graphs , 2009, NIPS.

[11]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[12]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[13]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[14]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[17]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[18]  Eric P. Xing,et al.  Network Completion and Survey Sampling , 2009, AISTATS.

[19]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[20]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[21]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[22]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[23]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.