Characterization and detection of taxpayers with false invoices using data mining techniques

In this paper we give evidence that it is possible to characterize and detect those potential users of false invoices in a given year, depending on the information in their tax payment, their historical performance and characteristics, using different types of data mining techniques. First, clustering algorithms like SOM and neural gas are used to identify groups of similar behaviour in the universe of taxpayers. Then decision trees, neural networks and Bayesian networks are used to identify those variables that are related to conduct of fraud and/or no fraud, detect patterns of associated behaviour and establishing to what extent cases of fraud and/or no fraud can be detected with the available information. This will help identify patterns of fraud and generate knowledge that can be used in the audit work performed by the Tax Administration of Chile (in Spanish Servicio de Impuestos Internos (SII)) to detect this type of tax crime.

[1]  Glenn J. Myatt Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining , 2006 .

[2]  Jeffrey A. Dubin Criminal Investigation Enforcement Activities and Taxpayer Noncompliance , 2007 .

[3]  Jacques Wainer,et al.  Uses of artificial intelligence in the Brazilian customs fraud detection system , 2008, DG.O.

[4]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[5]  Dino Pedreschi,et al.  A classification-based methodology for planning audit strategies in fraud detection , 1999, KDD '99.

[6]  Centre for Tax Policy and Administration Tax guidance series Tax Administration Guidance – Record Keeping Record Keeping Guidance , 2004 .

[7]  Graham J. Williams,et al.  Exploratory Multilevel Hot Spot Analysis: Australian Taxation Office Case Study , 2007, AusDM.

[8]  Marcelo Bergman,et al.  Tax Evasion and the Rule of Law in Latin America: The Political Culture of Cheating and Compliance in Argentina and Chile , 2009 .

[9]  Praveen Pathak,et al.  Detecting Management Fraud in Public Companies , 2010, Manag. Sci..

[10]  Marcelo M. Guigale,et al.  An Opportunity for a Different Peru: Prosperous, Equitable, and Governable , 2006 .

[11]  B. Torgler Tax morale in Latin America , 2005 .

[12]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[13]  Howard R. Davia,et al.  Accountant's Guide to Fraud Detection and Control, Second Edition , 2000 .

[14]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[15]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[16]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[17]  Graham Harrison,et al.  Vat Refunds: A Review of Country Experience , 2005, SSRN Electronic Journal.

[18]  Avelino J. Gonzalez,et al.  Tracking dirty proceeds: Exploring data mining technologies as tools to investigate money laundering , 2003 .

[19]  F. Schneider,et al.  Shadow Economies: Size, Causes, and Consequences , 2000 .

[20]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[21]  Shaio Yan Huang,et al.  Using the artificial neural network to predict fraud litigation: Some empirical evidence from emerging markets , 2009, Expert Syst. Appl..

[22]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[23]  Amir F. Atiya,et al.  Application of the recurrent multilayer perceptron in modeling complex process dynamics , 1994, IEEE Trans. Neural Networks.

[24]  Lorena Cerda,et al.  Segmentacide los Contribuyentes que Declaran IVA Aplicando Herramientas de Clustering , 2007 .

[25]  Erland Jonsson,et al.  Synthesizing test data for fraud detection systems , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..