Fuzzy Discretization Process from Small Datasets

A classification problem involves selecting a training dataset with class labels, developing an accurate description or a model for each class using the attributes available in the data, and then evaluating the prediction quality of the induced model. In this paper, we focus on supervised classification and models which have been obtained from datasets with few examples in relation with the number of attributes. Specifically, we propose a fuzzy discretization method of numerical attributes from datasets with few examples. The discretization of numerical attributes can be a crucial step since there are classifiers that cannot deal with numerical attributes, and there are other classifiers that exhibit better performance when these attributes are discretized. Also we show the benefits of the fuzzy discretization method from dataset with few examples by means of several experiments. The experiments have been validated by means of statistical tests.

[1]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[2]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[4]  Àngel García-Cerdaña,et al.  Refining Discretizations of Continuous-Valued Attributes , 2012, MDAI.

[5]  Djamel A. Zighed,et al.  A Soft Discretization Technique for Fuzzy Decision Trees Using Resampling , 2009, IDEAL.

[6]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[7]  Beatrice Lazzerini,et al.  Learning knowledge bases of multi-objective evolutionary fuzzy systems by simultaneously optimizing accuracy, complexity and partition integrity , 2011, Soft Comput..

[8]  Mohammed Al-Shalalfa,et al.  Fuzzy clustering-based discretization for gene expression classification , 2010, Knowledge and Information Systems.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Piero P. Bonissone,et al.  A fuzzy random forest , 2010, Int. J. Approx. Reason..

[11]  Andrew K. C. Wong,et al.  A fuzzy approach to partitioning continuous attributes for classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Piero P. Bonissone,et al.  Extending information processing in a Fuzzy Random Forest ensemble , 2012, Soft Comput..

[13]  Longbing Cao,et al.  CD: A Coupled Discretization Algorithm , 2012, PAKDD.

[14]  Earl Cox,et al.  Fuzzy Modeling And Genetic Algorithms For Data Mining And Exploration , 2005 .

[15]  Piero P. Bonissone,et al.  OFP_CLASS: a hybrid method to generate optimized fuzzy partitions for classification , 2012, Soft Comput..

[16]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[17]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .