Financial fraud detection by using Grammar-based multi-objective genetic programming with ensemble learning

Financial fraud is a criminal act, which violates the law, rules or policy to gain unauthorized financial benefit. The major consequences are loss of billions of dollars each year, investor confidence or corporate reputation. A study area called Financial Fraud Detection (FFD) is obligatory, in order to prevent the destructive results caused by financial fraud. In this study, we propose a new method based on Grammar-based Genetic Programming (GBGP), multi-objectives optimization and ensemble learning for solving FFD problems. We comprehensively compare the proposed method with Logistic Regression (LR), Neural Networks (NNs), Support Vector Machine (SVM), Bayesian Networks (BNs), Decision Trees (DTs), AdaBoost, Bagging and LogitBoost on four FFD datasets. The experimental results showed the effectiveness of the new approach in the given FFD problems including two real-life problems. The major implications and significances of the study can concretely generalize for two points. First, it evaluates a number of data mining techniques by the given real-life classification problems. Second, it suggests a new method based on GBGP, NSGA-II and ensemble learning.

[1]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[2]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[3]  M. Firth,et al.  Is China's Securities Regulatory Agency a Toothless Tiger? Evidence from Enforcement Actions , 2005 .

[4]  Michael Firth,et al.  Ownership structure, corporate governance, and fraud: Evidence from China , 2006 .

[5]  Man Leung Wong,et al.  Evolutionary Program Induction Directed by Logic Grammars , 1997, Evolutionary Computation.

[6]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  RadhaKanta Mahapatra,et al.  Business data mining - a machine learning perspective , 2001, Inf. Manag..

[9]  Yair Wand,et al.  Using Cognitive Principles to Guide Classification in Information Systems Modeling , 2008, MIS Q..

[10]  Chen Xu Customer lifetime value : an integrated data mining approach , 2006 .

[11]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[12]  Frank Yu,et al.  Corporate Lobbying and Fraud Detection , 2010, Journal of Financial and Quantitative Analysis.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  Tao Guo,et al.  Neural data mining for credit card fraud detection , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[15]  Yannis Manolopoulos,et al.  Data Mining techniques for the detection of fraudulent financial statements , 2007, Expert Syst. Appl..

[16]  H. Eskandari,et al.  A fast Pareto genetic algorithm approach for solving expensive multiobjective optimization problems , 2008, J. Heuristics.

[17]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[18]  Seoung Bum Kim,et al.  FBP: A Frontier-Based Tree-Pruning Algorithm , 2006, INFORMS J. Comput..

[19]  Joydeep Ghosh,et al.  Generative Oversampling for Mining Imbalanced Datasets , 2007, DMIN.

[20]  Stuart L. Gillan Recent Developments in Corporate Governance: An Overview , 2006 .

[21]  Peter A. Whigham,et al.  Grammatically-based Genetic Programming , 1995 .

[22]  T. Wang,et al.  Corporate Fraud and Business Conditions: Evidence from IPOs , 2009 .

[23]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[24]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[25]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[26]  Peter A. Whigham,et al.  Grammar-based Genetic Programming: a survey , 2010, Genetic Programming and Evolvable Machines.

[27]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[28]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[29]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[30]  Gediminas Adomavicius,et al.  A Machine Learning Approach to Improving Dynamic Decision Making , 2014, Inf. Syst. Res..

[31]  Gianluca Bontempi,et al.  Learned lessons in credit card fraud detection from a practitioner perspective , 2014, Expert Syst. Appl..

[32]  Yong Yu,et al.  Sales forecasting using extreme learning machine with applications in fashion retailing , 2008, Decis. Support Syst..

[33]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[34]  Charalambos Spathis Detecting false financial statements using published data: some evidence from Greece , 2002 .

[35]  Antonin Ponsich,et al.  A Survey on Multiobjective Evolutionary Algorithms for the Solution of the Portfolio Optimization Problem and Other Finance and Economics Applications , 2013, IEEE Transactions on Evolutionary Computation.

[36]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[37]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[38]  Kwong-Sak Leung,et al.  Data Mining Using Grammar Based Genetic Programming and Applications , 2000 .

[39]  Xiao-lan Deng,et al.  The Effects of Manager Compensation and Market Competition on Financial Fraud in Public Companies: An Empirical Study in China , 2008 .

[40]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[41]  I. Dyck,et al.  Who Blows the Whistle on Corporate Fraud? , 2007 .

[42]  Mark I. Hwang,et al.  A fuzzy neural network for assessing the risk of fraudulent financial reporting , 2003 .

[43]  Haibing Li,et al.  Applying Ant Colony Optimization to configuring stacking ensembles for data mining , 2014, Expert Syst. Appl..

[44]  J E Hopcroft,et al.  “Introduction to Automata Theory, Languages and Computations”, Second Edition, Pearson Education, 2008. (UNIT 1,2,3) 2 , 2015 .

[45]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[46]  Vadlamani Ravi,et al.  Detection of financial statement fraud and feature selection using data mining techniques , 2011, Decis. Support Syst..

[47]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[48]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[49]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[50]  J. M. Serrano,et al.  Association rules applied to credit card fraud detection , 2009, Expert Syst. Appl..

[51]  James D. Cox,et al.  SEC Enforcement Heuristics: An Empirical Inquiry , 2003 .

[52]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[53]  Balaji Padmanabhan,et al.  From information to operations: Service quality and customer retention , 2011, TMIS.

[54]  Kate Smith-Miles,et al.  A Comprehensive Survey of Data Mining-based Fraud Detection Research , 2010, ArXiv.

[55]  Kwong-Sak Leung,et al.  Using Grammar Based Genetic Programming for Data Mining of Medical Knowledge , 2006 .

[56]  Leonid Churilov,et al.  Data Mining with Combined Use of Optimization Techniques and Self-Organizing Maps for Improving Risk Grouping Rules: Application to Prostate Cancer Patients , 2005, J. Manag. Inf. Syst..

[57]  Benjamin E. Hermalin,et al.  Information Disclosure and Corporate Governance , 2011 .

[58]  Jun Wang,et al.  Nonlinear Blind Source Separation Using Higher Order Statistics and a Genetic Algorithm , 2001 .

[59]  Hon-Kwong Lui,et al.  Machine Learning for Direct Marketing Response Models: Bayesian Networks with Evolutionary Programming , 2006, Manag. Sci..

[60]  David B. Farber,et al.  Restoring Trust after Fraud: Does Corporate Governance Matter? , 2004 .

[61]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[62]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[63]  Aihua Shen,et al.  Application of Classification Models on Credit Card Fraud Detection , 2007, 2007 International Conference on Service Systems and Service Management.

[64]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[65]  Sotiris Kotsiantis,et al.  Forecasting Fraudulent Financial Statements using Data Mining , 2007 .

[66]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[67]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[68]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[69]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[70]  Chang-Tien Lu,et al.  Survey of fraud detection techniques , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[71]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[72]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[73]  Alex Alves Freitas,et al.  Evolving rule induction algorithms with multi-objective grammar-based genetic programming , 2009, Knowledge and Information Systems.

[74]  M Syeda,et al.  Parallel granular neural networks for fast credit card fraud detection , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[75]  Xiaoguang Yang,et al.  False Financial Statements: Characteristics of China's Listed Companies and CART Detecting Approach , 2008, Int. J. Inf. Technol. Decis. Mak..

[76]  Arti Mohanpurkar,et al.  Credit card fraud detection using Hidden Markov Model , 2011, 2011 World Congress on Information and Communication Technologies.

[77]  Anup Agrawal,et al.  Corporate Governance and Accounting Scandals* , 2005, The Journal of Law and Economics.

[78]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[79]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[80]  Enrique Alba,et al.  MOCell: A cellular genetic algorithm for multiobjective optimization , 2009, Int. J. Intell. Syst..

[81]  Andreas Geyer-Schulz,et al.  Fuzzy Rule-Based Expert Systems and Genetic Machine Learning , 1996 .