AFRAID: Fraud detection via active inference in time-evolving social networks

Fraud is a social process that occurs over time. We introduce a new approach, called AFRAID, which utilizes active inference to better detect fraud in time-varying social networks. That is, classify nodes as fraudulent vs. non-fraudulent. In active inference on social networks, a set of unlabeled nodes is given to an oracle (in our case one or more fraud inspectors) to label. These labels are used to seed the inference process on previously trained classifier(s). The challenge in active inference is to select a small set of unlabeled nodes that would lead to the highest classification performance. Since fraud is highly adaptive and dynamic, selecting such nodes is even more challenging than in other settings. We apply our approach to a real-life fraud data set obtained from the Belgian Social Security Institution to detect social security fraud. In this setting, fraud is defined as the intentional failing of companies to pay tax contributions to the government. Thus, the social network is composed of companies and the links between companies indicate shared resources. Our approach, AFRAID, outperforms the approaches that do not utilize active inference by up to 15% in terms of precision.

[1]  Lise Getoor,et al.  Effective label acquisition for collective classification , 2008, KDD.

[2]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[3]  Manali Sharma,et al.  Most-Surely vs. Least-Surely Uncertain , 2013, 2013 IEEE 13th International Conference on Data Mining.

[4]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[5]  Ryan A. Rossi,et al.  Time-Evolving Relational Classification and Ensemble Methods , 2012, PAKDD.

[6]  Bart Baesens,et al.  Analytics in a Big Data World: The Essential Guide to Data Science and its Applications , 2014 .

[7]  Sofus A. Macskassy Using graph-based metrics with empirical risk minimization to speed up active learning on networked data , 2009, KDD.

[8]  Monique Snoeck,et al.  Guilt-by-Constellation: Fraud Detection by Suspicious Clique Memberships , 2015, 2015 48th Hawaii International Conference on System Sciences.

[9]  Véronique Van Vlasselaer,et al.  Using social network knowledge for detecting spider constructions in social security fraud , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[10]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[11]  Bin Wu,et al.  Exploiting Network Structure for Active Inference in Collective Classification , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[12]  Lise Getoor,et al.  Reflect and correct: A misclassification prediction approach to active inference , 2009, TKDD.

[13]  Jennifer Neville,et al.  Relational Active Learning for Joint Collective Classification Models , 2011, ICML.

[14]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[15]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[16]  Jennifer Neville,et al.  Exploiting time-varying relationships in statistical relational models , 2007, WebKDD/SNA-KDD '07.

[17]  Philip S. Yu,et al.  Proximity Tracking on Time-Evolving Bipartite Graphs , 2008, SDM.

[18]  Véronique Van Vlasselaer,et al.  Fraud Analytics : Using Descriptive, Predictive, and Social Network Techniques:A Guide to Data Science for Fraud Detection , 2015 .

[19]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.