Ranking the Uniformity of Interval Pairs

We study the problem of finding the most uniform partition of the class label distribution on an interval. This problem occurs, e.g., in supervised discretization of continuous features, where evaluation heuristics need to find the location of the best place to split the current feature. The weighted average of empirical entropies of the interval label distributions is often used in this task. We observe that this rule is suboptimal, because it prefers short intervals too much. Therefore, we proceed to study alternative approaches. A solution that is based on compression turns out to be the best in our empirical experiments. We also study how these alternative methods affect the performance of classification algorithms.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  Alan Bundy,et al.  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Xinghuo Yu,et al.  AI 2004: Advances in Artificial Intelligence, 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004, Proceedings , 2004, Australian Conference on Artificial Intelligence.

[5]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[6]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[7]  Joost N. Kok,et al.  Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, PKDD.

[8]  Remco R. Bouckaert Naive Bayes Classifiers That Perform Well with Continuous Variables , 2004, Australian Conference on Artificial Intelligence.

[9]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[10]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[11]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[12]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[13]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[14]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[15]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[16]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[17]  Andrew GelmanyJanuary,et al.  Prior distribution , 2000 .

[18]  Mu Zhu The Counter-intuitive Non-informative Prior for the Bernoulli Family , 2004 .

[19]  Tapio Elomaa,et al.  Fast Minimum Training Error Discretization , 2002, ICML.

[20]  Yishay Mansour,et al.  On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[21]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[22]  G. Enderlein Wilks, S. S.: Mathematical Statistics. J. Wiley and Sons, New York–London 1962; 644 S., 98 s , 1964 .

[23]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[24]  Tapio Elomaa,et al.  Improved Algorithms for Univariate Discretization of Continuous Features , 2007, PKDD.

[25]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[26]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[27]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[28]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.