Relevance search and anomaly detection in bipartite graphs

Many real applications can be modeled using bipartite graphs, such as users vs. files in a P2P system, traders vs. stocks in a financial trading system, conferences vs. authors in a scientific publication network, and so on. We introduce two operations on bipartite graphs: 1) identifying similar nodes (relevance search), and 2) finding nodes connecting irrelevant nodes (anomaly detection). And we propose algorithms to compute the relevance score for each node using random walk with restarts and graph partitioning; we also propose algorithms to identify anomalies, using relevance scores. We evaluate the quality of relevance search based on semantics of the datasets, and we also measure the performance of the anomaly detection algorithm with manually injected anomalies. Both effectiveness and efficiency of the methods are confirmed by experiments on several real datasets.

[1]  Alexander Weber,et al.  Browsing and visualizing digital bibliographic data , 2004, VISSYM'04.

[2]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[3]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[5]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[6]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[7]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[8]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[9]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[12]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[13]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[14]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[15]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[17]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[18]  G. Strang Introduction to Linear Algebra , 1993 .