Exploring Correlation Network for Cheating Detection

The correlation network, typically formed by computing pairwise correlations between variables, has recently become a competitive paradigm to discover insights in various application domains, such as climate prediction, financial marketing, and bioinformatics. In this study, we adopt this paradigm to detect cheating behavior hidden in business distribution channels, where falsified big deals are often made by collusive partners to obtain lower product prices—a behavior deemed to be extremely harmful to the sale ecosystem. To this end, we assume that abnormal deals are likely to occur between two partners if their purchase-volume sequences have a strong negative correlation. This seemingly intuitive rule, however, imposes several research challenges. First, existing correlation measures are usually symmetric and thus cannot distinguish the different roles of partners in cheating. Second, the tick-to-tick correspondence between two sequences might be violated due to the possible delay of purchase behavior, which should also be captured by correlation measures. Finally, the fact that any pair of sequences could be correlated may result in a number of false-positive cheating pairs, which need to be corrected in a systematic manner. To address these issues, we propose a correlation network analysis framework for cheating detection. In the framework, we adopt an asymmetric correlation measure to distinguish the two roles, namely, cheating seller and cheating buyer, in a cheating alliance. Dynamic Time Warping is employed to address the time offset between two sequences in computing the correlation. We further propose two graph-cut methods to convert the correlation network into a bipartite graph to rank cheating partners, which simultaneously helps to remove false-positive correlation pairs. Based on a 4-year real-world channel dataset from a worldwide IT company, we demonstrate the effectiveness of the proposed method in comparison to competitive baseline methods.

[1]  Pietro Perona,et al.  Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Fabrizio Lillo,et al.  Correlation, Hierarchies, and Networks in Financial Markets , 2008, 0809.4615.

[3]  G. Caldarelli,et al.  Networks of equities in financial markets , 2004 .

[4]  张遵强,et al.  Providing Consistent Opinions from Online Reviews: A Heuristic Stepwise Optimization Approach , 2016 .

[5]  Hye-Jin Kim,et al.  Recent Progress on Graph Partitioning Problems Using Evolutionary Computation , 2018, ArXiv.

[6]  Gediminas Adomavicius,et al.  Classification, Ranking, and Top-K Stability of Recommendation Algorithms , 2016, INFORMS J. Comput..

[7]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[8]  Zsolt Miklós Kovács-Vajna,et al.  A Fingerprint Verification System Based on Triangular Matching and Dynamic Time Warping , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Min Zhang,et al.  Two-dimensional correlation optimized warping algorithm for aligning GC x GC-MS data. , 2008, Analytical chemistry.

[10]  S. Havlin,et al.  Stability of Climate Networks with Time , 2011, Scientific Reports.

[11]  Frans van den Berg,et al.  Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data , 2004 .

[12]  Philip S. Yu,et al.  Local Correlation Tracking in Time Series , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  Geoffrey I. Webb,et al.  Efficient search of the best warping window for Dynamic Time Warping , 2018, SDM.

[14]  Utkarsh Upadhyay,et al.  Recurrent Marked Temporal Point Processes: Embedding Event History to Vector , 2016, KDD.

[15]  Laurent Itti,et al.  shapeDTW: Shape Dynamic Time Warping , 2016, Pattern Recognit..

[16]  Andreas Björklund,et al.  Set Partitioning via Inclusion-Exclusion , 2009, SIAM J. Comput..

[17]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[18]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[19]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[20]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[21]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[22]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[23]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[24]  Allan J. Magrath,et al.  Dealing with Cheating in Distribution , 1989 .

[25]  Tomomi Matsui,et al.  63-Approximation Algorithm for MAX DICUT , 2001, RANDOM-APPROX.

[26]  Ferran Sanz,et al.  Identifying temporal patterns in patient disease trajectories using dynamic time warping: A population-based study , 2018, Scientific Reports.

[27]  Paolo Barucca,et al.  Spectral partitioning in equitable graphs. , 2016, Physical review. E.

[28]  Lorenzo Orecchia,et al.  Fast Approximation Algorithms for Graph Partitioning Using Spectral and Semidefinite-Programming Techniques , 2011 .

[29]  Vipin Kumar,et al.  Testing the significance of spatio-temporal teleconnection patterns , 2012, KDD.

[30]  Tomomi Matsui,et al.  0.863-Approximation Algorithm for MAX DICUT , 2001 .

[31]  Hongyuan Zha,et al.  Modeling the Intensity Function of Point Process Via Recurrent Neural Networks , 2017, AAAI.

[32]  Potsdam,et al.  Complex networks in climate dynamics. Comparing linear and nonlinear network construction methods , 2009, 0907.4359.

[33]  Uri Zwick,et al.  Combinatorial approximation algorithms for the maximum directed cut problem , 2001, SODA '01.

[34]  R. Mantegna Hierarchical structure in financial markets , 1998, cond-mat/9802256.

[35]  Claus A. Andersson,et al.  Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data , 2004 .

[36]  David P. Williamson,et al.  .879-approximation algorithms for MAX CUT and MAX 2SAT , 1994, STOC '94.

[37]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[38]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[39]  Wan Li,et al.  Deal or deceit: detecting cheating in distribution channels , 2014, CIKM.

[40]  Hui Xiong,et al.  An Influence Propagation View of PageRank , 2017, ACM Trans. Knowl. Discov. Data.

[41]  Eshel Ben-Jacob,et al.  Dynamics of Stock Market Correlations , 2010 .

[42]  Hui Xiong,et al.  Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs , 2004, KDD.

[43]  Shlomo Havlin,et al.  Very early warning of next El Niño , 2014, Proceedings of the National Academy of Sciences.

[44]  Peter Sanders,et al.  Recent Advances in Graph Partitioning , 2013, Algorithm Engineering.

[45]  Das Narayandas,et al.  Building and Sustaining Buyer–Seller Relationships in Mature Industrial Markets , 2004 .