An empirical experimental evaluation on imbalanced data sets with varied imbalance ratio

Class imbalance presents a problem when traditional Classification algorithms are applied .In the previous years there are most important substitution and change has been carried out on data classification. Classification of data becomes difficult because of its unbalanced nature. The problem of imbalance class has developed into significant data mining issue. The class imbalance situation arises when one class is rare compared to the other, take place frequently in machine learning applications. Dataset of unbalanced learning is a new concept of machine learning which has applicability in real time, since all the datasets of real time are of unbalanced in nature. Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, ensemble learning and algorithmic modification for transforming vast amounts of skewed data efficiently into information and knowledge representation. In this paper, we conducted an empirical study on imbalance datasets. Experimental Results shows conclusion of some findings using Area Under Curve (AUC), precision, F-Measure, TN-rate TP-rate evaluation metrics.

[1]  Zhi-Bo Zhu,et al.  Fault diagnosis based on imbalance modified kernel Fisher discriminant analysis , 2010 .

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  Xue-wen Chen,et al.  Combating the Small Sample Class Imbalance Problem Using Feature Selection , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[5]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[6]  David P. Williams,et al.  Mine Classification With Imbalanced Data , 2009, IEEE Geoscience and Remote Sensing Letters.

[7]  Antoine Geissbühler,et al.  Learning from imbalanced data in surveillance of nosocomial infection , 2006, Artif. Intell. Medicine.

[8]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[9]  Yong Wang,et al.  Classifying skewed data streams based on reusing data , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[10]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[11]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[13]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[14]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[15]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[16]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[17]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[18]  María José del Jesús,et al.  A Study on the Use of the Fuzzy Reasoning Method Based on the Winning Rule vs. Voting Procedure for Classification with Imbalanced Data Sets , 2007, IWANN.

[19]  Javier Pérez-Rodríguez,et al.  Class imbalance methods for translation initiation site recognition in DNA sequences , 2012, Knowl. Based Syst..

[20]  Ester Bernadó-Mansilla,et al.  Evolutionary rule-based systems for imbalanced data sets , 2008, Soft Comput..