Refining Discretizations of Continuous-Valued Attributes

The Rand index is a measure commonly used to compare crisp partitions. Campello (2007) and Hullermeier and Rifqi (2009) respectively, proposed two extensions of this index capable to compare fuzzy partitions. These approaches are useful when continuous values of attributes are discretized using fuzzy sets. In previous works we experimented with these extensions and compared their accuracy with the one of the crisp Rand index. In this paper we propose the e-procedure, an alternative way to deal with attributes taking continuous values. Accuracy results on some known datasets of the Machine Learning repository using the e-procedure as crisp discretization method jointly with the crisp Rand index are comparable to the ones given using the crisp Rand index and its fuzzifications with standard crisp and fuzzy discretization methods respectively.

[1]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[2]  J. Alexander,et al.  Theory and Methods: Critical Essays in Human Geography , 2008 .

[3]  Hisao Ishibuchi,et al.  Effects of constructing fuzzy discretization from crisp discretization for rule-based classifiers , 2008, Artificial Life and Robotics.

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[5]  Eyke Hüllermeier,et al.  A Fuzzy Variant of the Rand Index for Comparing Clustering Structures , 2009, IFSA/EUSFLAT Conf..

[6]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[7]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[8]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[9]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[10]  Hisao Ishibuchi,et al.  Deriving fuzzy discretization from interval discretization , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[11]  Ronald R. Yager,et al.  Advances on Computational Intelligence , 2012, Communications in Computer and Information Science.

[12]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[13]  Roelof K. Brouwer Extending the rand, adjusted rand and jaccard indices to fuzzy partitions , 2008, Journal of Intelligent Information Systems.

[14]  Hichem Frigui,et al.  Clustering and aggregation of relational data with applications to image database categorization , 2007, Pattern Recognit..

[15]  Eyke Hüllermeier,et al.  Comparing Fuzzy Partitions: A Generalization of the Rand Index and Related Measures , 2012, IEEE Transactions on Fuzzy Systems.

[16]  Xindong Wu,et al.  Discretization Methods , 2010, Data Mining and Knowledge Discovery Handbook.

[17]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[18]  James M. Keller,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions , 2010, IEEE Transactions on Fuzzy Systems.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Àngel García-Cerdaña,et al.  Towards a Fuzzy Extension of the López de Mántaras Distance , 2012, IPMU.

[21]  Àngel García-Cerdaña,et al.  Lazy Induction of Descriptions Using Two Fuzzy Versions of the Rand Index , 2010, IPMU.