Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks

Mining outliers in a heterogeneous information network is a challenging problem: It is even unclear what should be outliers in a large heterogeneous network (e.g., Outliers in the entire bibliographic network consisting of authors, titles, papers and venues). In this study, we propose an interesting class of outliers, query-based sub network outliers: Given a heterogeneous network, a user raises a query to retrieve a set of task-relevant sub networks, among which, sub network outliers are those that significantly deviate from others (e.g., Outliers of author groups among those studying "topic modeling"). We formalize this problem and propose a general framework, where one can query for finding sub network outliers with respect to different semantics. We introduce the notion of sub network similarity that captures the proximity between two sub networks by their membership distributions. We propose an outlier detection algorithm to rank all the sub networks according to their outlierness without tuning parameters. Our quantitative and qualitative experiments on both synthetic and real data sets show that the proposed method outperforms other baselines.

[1]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[3]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[4]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[5]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[6]  Ruoming Jin,et al.  Axiomatic ranking of network role similarity , 2011, KDD.

[7]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[8]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Rose Yu,et al.  GLAD: group anomaly detection in social media analysis , 2014, ACM Trans. Knowl. Discov. Data.

[10]  Kenji Yamanishi,et al.  Network anomaly detection based on Eigen equation compression , 2009, KDD.

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[13]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[14]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[15]  Michael R. Lyu,et al.  MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching , 2009, CIKM.

[16]  Heikki Mannila,et al.  Distance measures for point sets and their computation , 1997, Acta Informatica.

[17]  Jiawei Han,et al.  Local Learning for Mining Outlier Subgraphs from Network Datasets , 2014, SDM.

[18]  S. F. Begum,et al.  Meta Path Based Top-K Similarity Join In Heterogeneous Information Networks , 2016 .

[19]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Juan-Zi Li,et al.  Extraction and mining of an academic social network , 2008, WWW.

[21]  Yizhou Sun,et al.  Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[22]  Jiawei Han,et al.  Top-K interesting subgraph discovery in information networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[23]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.