Artificial intelligence techniques for unbalanced datasets in real world classification tasks

In this chapter a survey on the problem of classification tasks in unbalanced datasets is presented. The effect of the imbalance of the distribution of target classes in databases is analyzed with respect to the performance of standard classifiers such as decision trees and support vector machines, and the main approaches to improve the generally not satisfactory results obtained by such methods are described. Finally, two typical applications coming from real world frameworks are introduced, and the uses of the techniques employed for the related classification tasks are shown in practice.

[1]  Jure Leskovec,et al.  Linear Programming Boosting for Uneven Datasets , 2003, ICML.

[2]  YuHwanjo,et al.  Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods , 2008 .

[3]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[4]  Xenia Naidenova Machine Learning Methods for Commonsense Reasoning Processes: Interactive Models , 2009 .

[5]  George Baciu,et al.  Cognitive location-aware information retrieval by agent-based semantic matching , 2009, 2009 8th IEEE International Conference on Cognitive Informatics.

[6]  Marta Prim,et al.  Rectangular Basis Functions Applied to Imbalanced Datasets , 2007, ICANN.

[7]  Yingxu Wang Breakthroughs in Software Science and Computational Intelligence , 2012 .

[8]  Sun I. Kim,et al.  Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods , 2008, Artif. Intell. Medicine.

[9]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[10]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[11]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[12]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[13]  Marco Vannucci,et al.  Thresholded Neural Networks for Sensitive Industrial Classification Tasks , 2009, IWANN.

[14]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[15]  Xenia Naidenova The Examples of Human Commonsense Reasoning Processes , 2010 .

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[18]  Michael R. Berthold,et al.  From radial to rectangular basis functions : A new approach for rule learning from large datasets , 1995 .

[19]  Jesús Cerquides,et al.  Imbalanced Datasets Classification by Fuzzy Rule Extraction and Genetic Algorithms , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[20]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[21]  Alberto Del Bimbo,et al.  Face Recognition Based on Manifold Learning and SVM Classification of 2D and 3D Geodesic Curves , 2011 .

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[24]  Kiyoshi Yasuda,et al.  Remote Conversation Support for People with Aphasia , 2010, Int. J. Softw. Sci. Comput. Intell..

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Wei Ding,et al.  Adaptive Study Design Through Semantic Association Rule Analysis , 2011, Int. J. Softw. Sci. Comput. Intell..

[27]  Peng Li,et al.  Hybrid Kernel Machine Ensemble for Imbalanced Data Sets , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[28]  Karl Kristoffer Jensen Machine Learning Techniques for Adaptive Multimedia Retrieval: Technologies Applications and Perspectives , 2010 .

[29]  Qinming He,et al.  An Unbalanced Dataset Classification Approach Based on v-Support Vector Machine , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[30]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[31]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.