A General Suspiciousness Metric for Dense Blocks in Multimodal Data

Which seems more suspicious: 5,000 tweets from 200 users on 5 IP addresses, or 10,000 tweets from 500 users on 500 IP addresses but all with the same trending topic and all in 10 minutes? The literature has many methods that try to find dense blocks in matrices, and, recently, tensors, but no method gives a principled way to score the suspiciouness of dense blocks with different numbers of modes and rank them to draw human attention accordingly. Dense blocks are worth inspecting, typically indicating fraud, emerging trends, or some other noteworthy deviation from the usual. Our main contribution is that we show how to unify these methods and how to give a principled answer to questions like the above. Specifically, (a) we give a list of axioms that any metric of suspicousness should satisfy, (b) we propose an intuitive, principled metric that satisfies the axioms, and is fast to compute, (c) we propose CROSSSPOT, an algorithm to spot dense regions, and sort them in importance ("suspiciousness") order. Finally, we apply CROSSSPOT to real data, where it improves the F1 score over previous techniques by 68% and finds retweet-boosting in a real social dataset spanning 0.3 billion posts.

[1]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[2]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[4]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[5]  Christos Faloutsos,et al.  CatchSync: catching synchronized behavior in large directed graphs , 2014, KDD.

[6]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[7]  Christos Faloutsos,et al.  Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective , 2014, 2014 IEEE International Conference on Data Mining.

[8]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[9]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[10]  Christos Faloutsos,et al.  HaTen2: Billion-scale tensor decompositions , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[11]  Chris H. Q. Ding,et al.  Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering , 2008, KDD.

[12]  Christos Faloutsos,et al.  Inferring Strange Behavior from Connectivity Pattern in Social Networks , 2014, PAKDD.

[13]  Reid Andersen,et al.  A local algorithm for finding dense subgraphs , 2007, TALG.

[14]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[15]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[16]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..