A Spectral Framework for Detecting Inconsistency across Multi-source Object Relationships

In this paper, we propose to conduct anomaly detection across multiple sources to identify objects that have inconsistent behavior across these sources. We assume that a set of objects can be described from various perspectives (multiple information sources). The underlying clustering structure of normal objects is usually shared by multiple sources. However, anomalous objects belong to different clusters when considering different aspects. For example, there exist movies that are expected to be liked by kids by genre, but are liked by grown-ups based on user viewing history. To identify such objects, we propose to compute the distance between different eigen decomposition results of the same object with respect to different sources as its anomalous score. We also give interpretations from the perspectives of constrained spectral clustering and random walks over graph. Experimental results on several UCI as well as DBLP and Movie Lens datasets demonstrate the effectiveness of the proposed approach.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Nguyen Lu Dang Khoa,et al.  Robust Outlier Detection Using Commute Time and Eigenspace Embedding , 2010, PAKDD.

[3]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[4]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[5]  Ian Davidson,et al.  Discovering Contexts and Contextual Outliers Using Random Walks in Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[6]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[8]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[9]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[10]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[11]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[12]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[13]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[16]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[17]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[18]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.