Guilt-by-Constellation: Fraud Detection by Suspicious Clique Memberships

Given a labeled graph containing fraudulent and legitimate nodes, which nodes group together? How can we use the riskiness of node groups to infer a future label for new members of a group? This paper focuses on social security fraud where companies are linked to the resources they use and share. The primary goal in social security fraud is to detect companies that intentionally fail to pay their contributions to the government. We aim to detect fraudulent companies by (1) propagating a time-dependent exposure score for each node based on its relationships to known fraud in the network, (2) deriving cliques of companies and resources, and labeling these cliques in terms of their fraud and bankruptcy involvement, and (3) characterizing each company using a combination of intrinsic and relational features and its membership in suspicious cliques. We show that clique-based features boost the performance of traditional relational models.

[1]  Danai Koutra,et al.  Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms , 2011, ECML/PKDD.

[2]  Véronique Van Vlasselaer,et al.  Using social network knowledge for detecting spider constructions in social security fraud , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[3]  Corinna Cortes,et al.  Communities of interest , 2001, Intell. Data Anal..

[4]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[7]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[8]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[9]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[10]  Ran Raz,et al.  On the complexity of matrix product , 2002, STOC '02.

[11]  Brian J. Taylor,et al.  Relational data pre-processing techniques for improved securities fraud detection , 2007, KDD '07.

[12]  Monique Snoeck,et al.  GOTCHA! Network-Based Fraud Detection for Social Security Fraud , 2017, Manag. Sci..

[13]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[14]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[17]  Jennifer Neville,et al.  Using relational knowledge discovery to prevent securities fraud , 2005, KDD '05.