Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection

Given an IP source-destination traffic network, how do we spot mis-behavioral IP sources (e.g., port-scanner)? How do we find strange users in a user-movie rating graph? Moreover, how can we present the results intuitively so that it is relatively easier for data analysts to interpret? We propose NrMF, a non-negative residual matrix factorization framework, to address such challenges. We present an optimization formulation as well as an effective algorithm to solve it. Our method can naturally capture abnormal behaviors on graphs. In addition, the proposed algorithm is linear wrt the size of the graph therefore it is suitable for large graphs. The experimental results on several data sets validate its effectiveness as well as efficiency.

[1]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ravi Kumar,et al.  Dynamics of conversations , 2010, KDD.

[3]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[4]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[5]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[6]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[7]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[8]  Decision Systems.,et al.  A simple polynomial-time algorithm for convex quadratic programming , 1988 .

[9]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[10]  Haesun Park,et al.  Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[12]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[13]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[14]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[16]  M. Dahleh Laboratory for Information and Decision Systems , 2005 .

[17]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition , 2006, SIAM J. Comput..

[18]  Lawrence B. Holder,et al.  Mining for Structural Anomalies in Graph-based Data , 2007, DMIN.

[19]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[20]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[21]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[22]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[23]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[25]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[26]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[27]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[28]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[29]  Philip S. Yu,et al.  Colibri: fast mining of large static and dynamic graphs , 2008, KDD.

[30]  Philip S. Yu,et al.  Proximity Tracking on Time-Evolving Bipartite Graphs , 2008, SDM.

[31]  Gene H. Golub,et al.  Matrix computations , 1983 .

[32]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[33]  Pauli Miettinen,et al.  Interpretable nonnegative matrix decompositions , 2008, KDD.

[34]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[35]  Jimeng Sun,et al.  Less is More: Compact Matrix Decomposition for Large Sparse Graphs , 2007, SDM.

[36]  Sushil Verma,et al.  A note on the strong polynomiality of convex quadratic programming , 1995, Math. Program..

[37]  Yehuda Koren,et al.  Modeling relationships at multiple scales to improve accuracy of large recommender systems , 2007, KDD '07.

[38]  Raul Kompass,et al.  A Generalized Divergence Measure for Nonnegative Matrix Factorization , 2007, Neural Computation.

[39]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[40]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.