Simultaneous classification and community detection on heterogeneous network data

Previous studies on network mining have focused primarily on learning a single task (such as classification or community detection) on a given network. This paper considers the problem of multi-task learning on heterogeneous network data. Specifically, we present a novel framework that enables one to perform classification on one network and community detection in another related network. Multi-task learning is accomplished by introducing a joint objective function that must be optimized to ensure the classes in one network are consistent with the link structure, nodal attributes, as well as the communities detected in another network. We provide both theoretical and empirical analysis of the framework. We also show that the framework can be extended to incorporate prior information about the correspondences between the clusters and classes in different networks. Experiments performed on both real-world and synthetic data sets demonstrate the effectiveness of the joint framework compared to applying classification and community detection algorithms on each network separately.

[1]  Masashi Sugiyama,et al.  Robust Label Propagation on Multiple Networks , 2009, IEEE Transactions on Neural Networks.

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[4]  Chung-Kuan Cheng,et al.  Towards efficient hierarchical designs by ratio cut partitioning , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[5]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[6]  Shenghuo Zhu,et al.  Learning multiple graphs for document recommendations , 2008, WWW.

[7]  Philip S. Yu,et al.  Co-clustering by block value decomposition , 2005, KDD '05.

[8]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[9]  Jian Hu,et al.  Using Wikipedia for Co-clustering Based Cross-Domain Text Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[11]  Jiawei Han,et al.  Community Mining from Multi-relational Networks , 2005, PKDD.

[12]  Pang-Ning Tan,et al.  Identifying Cohesive Subgroups and Their Correspondences in Multiple Related Networks , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[13]  Pang-Ning Tan,et al.  A framework for joint community detection across multiple related networks , 2012, Neurocomputing.

[14]  Ted E. Senator,et al.  Link mining applications: progress and challenges , 2005, SKDD.

[15]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[16]  Pavel Velikhov,et al.  Harnessing Wikipedia for smart tags clustering , 2008 .

[17]  Carlotta Domeniconi,et al.  Building semantic kernels for text classification using wikipedia , 2008, KDD.

[18]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[19]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[20]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Chih-Jen Lin,et al.  On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization , 2007, IEEE Transactions on Neural Networks.

[22]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[23]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[24]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[25]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[26]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[27]  Philip S. Yu,et al.  A General Model for Multiple View Unsupervised Learning , 2008, SDM.

[28]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[29]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[30]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Pang-Ning Tan,et al.  A co-classification framework for detecting web spam and spammers in social media web sites , 2009, CIKM.

[32]  Huan Liu,et al.  Uncoverning Groups via Heterogeneous Interaction Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[33]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[34]  Pang-Ning Tan,et al.  Multi task learning on multiple related networks , 2010, CIKM '10.

[35]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[37]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[38]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.