HGsuspector : Scalable Collective Fraud Detection in Heterogeneous Graphs

Graph can straightforwardly represent the relations between the objects, which inevitably draws a lot of attention of both academia and industry. Achievements mainly concentrate on homogeneous graph and bipartite graph. However, it is difficult to use existing algorithm in actual scenarios. Because in the real world, the type of the objects and the relations are diverse and the amount of the data can be very huge. Considering of the characteristics of "black market", we proposeHGsuspector, a novel and scalable algorithm for detecting collective fraud in directed heterogeneous graphs.We first decompose directed heterogeneous graphs into a set of bipartite graphs, then we define a metric on each connected bipartite graph and calculate scores of it, which fuse the structure information and event probability. The threshold for distinguishing between normal and abnormal can be obtained by statistic or other anomaly detection algorithms in scores space. We also provide a technical solution for fraud detection in e-commerce scenario, which has been successfully applied in Jingdong e-commerce platform to detect collective fraud in real time. The experiments on real-world datasets, which has billion nodes and edges, demonstrate that HGsuspector is more accurate and fast than the most practical and state-of-the-art approach by far.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[4]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[5]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[6]  Christos Faloutsos,et al.  CatchSync: catching synchronized behavior in large directed graphs , 2014, KDD.

[7]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[8]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[9]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[10]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[11]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[12]  Sujith Ravi,et al.  Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation , 2015, AISTATS.

[13]  Hyun Ah Song,et al.  FRAUDAR: Bounding Graph Fraud in the Face of Camouflage , 2016, KDD.

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[16]  Yizhou Sun,et al.  Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events , 2016, IJCAI.

[17]  Leman Akoglu,et al.  Scalable Anomaly Ranking of Attributed Neighborhoods , 2016, SDM.

[18]  Maumita Bhattacharya,et al.  Intelligent Financial Fraud Detection: A Comprehensive Review , 2015 .

[19]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[20]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[21]  Christos Faloutsos,et al.  HoloScope: Topology-and-Spike Aware Fraud Detection , 2017, CIKM.

[22]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[23]  Philip S. Yu,et al.  HitFraud: A Broad Learning Approach for Collective Fraud Detection in Heterogeneous Information Networks , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[24]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[25]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..