Imbalanced Classification for Business Analytics

In pattern recognition, classification is a crucial task for automated data driven knowledge discovery. The objective of classification is to separate a set of data into classes or sub-categories and then to identify the classes that a new observation belongs to according to a training set of data. The mathematical model trained by a classification algorithm is termed classifier. When the class size of given examples is not equal for all classes, the classification problem is known as imbalanced (Japkowicz, 2000). For instance, in a cancer diagnostic problem the main objective is to identify individuals stricken with cancer and such events are relatively rare compared to normal cases. Imbalanced classification problems are also known as skewed class distribution problems or as small/ rare class learning problems (He & Garcia, 2009; Lemnaru & Potolea, 2012; Sun, Wong, & Mohamed, 2009). In binary classification, the class with fewer examples is known as the minority class and the other class as the majority class. In many applications (e.g. fraud detection, computer intrusion detection, oil spill detection, defect product detection), detection of minority class examples is more important than the majority class. Therefore, there is a need for efficient classification algorithms to address such problems. A preferred classification algorithm is the one that yields higher identification rate on rare events especially for applications where their misclassification yields to high losses. For instance in automated credit card fraud detection, a fraud event misclassification might result in high monetary losses for the credit card vendor. On the other side misclassification of non-fraudulent events will worsen the customer satisfaction experience. Imbalanced Classification for Business Analytics

[1]  Lidia Fuentes,et al.  Managing Variability of Ambient Intelligence Middleware , 2009, Int. J. Ambient Comput. Intell..

[2]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[3]  Rafiqul Islam,et al.  Institutionalization of Business Intelligence for the Decision-Making Iteration , 2019, Int. J. Intell. Inf. Technol..

[4]  Jianping Li,et al.  On the complexity of finding emerging patterns , 2005, Theor. Comput. Sci..

[5]  Xiaohua Hu,et al.  MAPLSC: A novel multi-class classifier for medical diagnosis , 2011, Int. J. Data Min. Bioinform..

[6]  Hans W. Guesgen,et al.  Recognising Human Behaviour in a Spatio-Temporal Context , 2011 .

[7]  Jesus A. Gonzalez,et al.  Symbolic One-Class Learning from Imbalanced Datasets: Application in Medical Diagnosis , 2009, Int. J. Artif. Intell. Tools.

[8]  Rodica Potolea,et al.  Imbalanced Classification Problems: Systematic Study, Issues and Best Practices , 2011, ICEIS.

[9]  L.M. Patnaik,et al.  Genetic Algorithm with Characteristic Amplification through Multiple Geographically Isolated Populations and Varied Fitness Landscapes , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[10]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[11]  Eric W. T. Ngai,et al.  Customer churn prediction using improved balanced random forests , 2009, Expert Syst. Appl..

[12]  Ioannis K. Vlachos,et al.  Intuitionistic Fuzzy Image Processing , 2009, Encyclopedia of Artificial Intelligence.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  Gary R. Weckman,et al.  A Prescriptive Stock Market Investment Strategy for the Restaurant Industry using an Artificial Neural Network Methodology , 2020, Deep Learning and Neural Networks.

[15]  Longbing Cao,et al.  Effective detection of sophisticated online banking fraud on extremely imbalanced data , 2012, World Wide Web.

[16]  Siddhartha Bhattacharyya,et al.  Minimal Intelligence Agents in Double Auction Markets with Speculators , 2006 .

[17]  David L. Olson,et al.  Data mining in business services , 2007 .

[18]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[19]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[20]  Johan L. Perols Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms , 2011 .

[21]  Ute Bradter,et al.  Identifying appropriate spatial scales of predictors in species distribution models with the random forest algorithm , 2013 .

[22]  Mangui Liang,et al.  Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises , 2013, Neurocomputing.

[23]  Stan Matwin,et al.  Evaluating Misclassifications in Imbalanced Data , 2006, ECML.

[24]  M. Tahar Kechadi,et al.  Customer churn prediction in telecommunications , 2012, Expert Syst. Appl..

[25]  Man Leung Wong,et al.  Model selection for direct marketing: performance criteria and validation methods , 2008 .

[26]  Jan Muntermann,et al.  An intraday market risk management approach based on textual analysis , 2011, Decis. Support Syst..

[27]  Ekrem Duman,et al.  Comparing alternative classifiers for database marketing: The case of imbalanced datasets , 2012, Expert Syst. Appl..

[28]  Hongxia Ke,et al.  Fuzzy Support Vector Machine for PolSAR Image Classification , 2013 .

[29]  Zijiang Yang,et al.  Towards an optimal classification model against imbalanced data for Customer Relationship Management , 2011, 2011 Seventh International Conference on Natural Computation.

[30]  Marzuki Khalid,et al.  A Hybrid Artificial Neural Network-Naive Bayes for solving imbalanced dataset problems in semiconductor manufacturing test process , 2011, 2011 11th International Conference on Hybrid Intelligent Systems (HIS).

[31]  Lakhmi Jain,et al.  Computational Economics: A Perspective from Computational Intelligence , 2006 .

[32]  David L. Olson,et al.  A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models , 2012, Service Business.

[33]  Sanjay Mohapatra,et al.  The Use of Technical and Fundamental Tools By Indian Stock Brokers , 2015 .

[34]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[35]  Ali Al-Shahib,et al.  Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence , 2005, Applied bioinformatics.

[36]  Dominique M. Hanssens,et al.  Modeling Customer Lifetime Value , 2006 .

[37]  En Sup Yoon,et al.  Weighted support vector machine for quality estimation in the polymerization process , 2005 .

[38]  Edward Y. Chang,et al.  Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance , 2003, MULTIMEDIA '03.

[39]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[40]  Zhen Ren,et al.  Power quality disturbance identification using wavelet packet energy entropy and weighted support vector machines , 2008, Expert Syst. Appl..

[41]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[42]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..