Cluster analysis for anomaly detection in accounting

Cluster Analysis is a useful technique for grouping data points such that points within a single group or cluster are similar, while points in different groups are distinctive. Clustering as an unsupervised learning algorithm is a good candidate for fraud and anomaly detection. The purpose of this study is to examine the possibility of using clustering technology for continuous auditing. Automating fraud filtering can be of great value to preventive continuous audits. In this paper, cluster-based outliers help auditors focus their efforts when evaluating group life insurance claims. Claims with similar characteristics have been grouped together and those clusters with small population have been flagged for further investigations. Some dominant characteristics of those clusters are, for example, having large beneficiary payment, having huge interest amount and having been submitted long time before getting paid. This study examines the application of cluster analysis in accounting domain. The results provide a guideline and evidence for the potential application of this technique in the field of audit.

[1]  Rajendra P. Srivastava,et al.  Detection of management fraud: a neural network approach , 1995, Proceedings the 11th Conference on Artificial Intelligence for Applications.

[2]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[3]  Shawn Ostermann,et al.  Detecting Anomalous Network Traffic with Self-organizing Maps , 2003, RAID.

[4]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[5]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[6]  Ashutosh Deshmukh,et al.  A rule based fuzzy reasoning system for assessing the risk of management fraud , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[7]  M. Vasarhelyi THE CONTINUOUS AUDIT OF ONLINE SYSTEMS , 1991 .

[8]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  P. Brockett,et al.  Using Kohonen's Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud , 1998 .

[11]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[12]  Mark I. Hwang,et al.  A fuzzy neural network for assessing the risk of fraudulent financial reporting , 2003 .

[13]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[14]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[15]  Ethem Alpaydin,et al.  Introduction to Machine Learning (Adaptive Computation and Machine Learning) , 2004 .

[16]  F. Acito,et al.  Development of archetypes of international marketing strategy , 2006 .

[17]  Sam Kash Kachigan Multivariate statistical analysis: A conceptual introduction , 1982 .

[18]  B. Green,et al.  Assessing the risk of management fraud through neural network technology , 1997 .

[19]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[20]  Hsin-Hsi Lai,et al.  Expression modes used by consumers in conveying desire for product form: A case study of a car , 2006 .

[21]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[22]  S. Tagg,et al.  Clustering Medical Journal Readership among GPs: Implications for Media Planning , 2007 .

[23]  D. Sexton,et al.  A Cluster Analytic Approach to Market Response Functions , 1974 .

[24]  R. Layton,et al.  Dimensions of Consumer Information Seeking Behavior , 1981 .

[25]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[26]  Miklos A. Vasarhelyi,et al.  Cluster Analysis for Anomaly Detection in Accounting Data: An Audit Approach 1 , 2011 .

[27]  Pat Langley,et al.  Conceptual clustering and its relation to numerical taxonomy , 1986 .

[28]  Joseph V. Carcello,et al.  A Decision Aid for Assessing the Likelihood of Fraudulent Financial Reporting , 2000 .

[29]  Alexander Hinneburg Visualizing Clustering Results , 2009, Encyclopedia of Database Systems.

[30]  Rajendra K. Srivastava,et al.  Market Structure Analysis: Hierarchical Clustering of Products Based on Substitution-in-Use , 1981 .

[31]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[32]  Ying Liu,et al.  Cluster-based outlier detection , 2009, Ann. Oper. Res..

[33]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[34]  N. Bratchell,et al.  Chapter 6 Cluster Analysis , 1992 .

[35]  Ya-Yueh Shih,et al.  A method for customer lifetime value ranking — Combining the analytic hierarchy process and clustering analysis , 2003 .

[36]  Vicki G. Morwitz,et al.  Using Segmentation to Improve Sales Forecasts Based on Purchase Intent: Which “Intenders” Actually Buy?: , 1992 .

[37]  Daling Wang,et al.  CD-Trees: An Efficient Index Structure for Outlier Detection , 2004, WAIM.

[38]  Vicki G. Morwitz,et al.  Using Segmentation to Improve Sales Forecasts Based on Purchase Intent: Which “Intenders” Actually Buy? , 1992 .

[39]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[40]  Rajendra P. Srivastava,et al.  Detection of Management Fraud: A Neural Network Approach , 1995 .

[41]  Howard B. Lee,et al.  Foundations of Behavioral Research , 1973 .

[42]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[43]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[44]  Ashutosh Deshmukh,et al.  A rule-based fuzzy reasoning system for assessing the risk of management fraud , 1998, Intell. Syst. Account. Finance Manag..

[45]  Charles M. Schaninger,et al.  The Complementary use of Multivariate Procedures to Investigate Nonlinear and Interactive Relationships between Personality and Product Usage , 1980 .

[46]  Kenneth O. Cogger,et al.  Neural network detection of management fraud using published financial data , 1998, Intell. Syst. Account. Finance Manag..

[47]  W. Anderson,et al.  Bank Selection Decisions and Market Segmentation , 1976 .