论文信息 - Making Better Use of Global Discretization

Making Better Use of Global Discretization

Before applying learning algorithms to datasets, practitioners often globally discretize any numeric attributes. If the algorithm cannot handle numeric attributes directly, prior discretization is essential. Even if it can, prior discretization often accelerates induction, and may produce simpler and more accurate classi ers. As it is generally done, global discretization denies the learning algorithm any chance of taking advantage of the ordering information implicit in numeric attributes. However, a simple transformation of discretized data preserves this information in a form that learners can use. We show that, compared to using the discretized data directly, this transformation signi cantly increases the accuracy of decision trees built by C4.5, decision lists built by PART, and decision tables built using the wrapper method, on several benchmark datasets. Moreover, it can signi cantly reduce the size of the resulting classi ers. This simple technique makes global discretization an even more useful tool for data preprocessing.

Ian H. Witten | Eibe Frank | I. Witten | Eibe Frank

[1] Ronald L. Rivest,et al. Learning decision lists , 2004, Machine Learning.

[2] Jason Catlett,et al. On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[3] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[4] Randy Kerber,et al. ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[5] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[6] Ron Kohavi,et al. The Power of Decision Tables , 1995, ECML.

[7] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[8] J. Ross Quinlan,et al. Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[9] Ron Kohavi,et al. Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[10] Larry A. Rendell,et al. Global Data Analysis and the Fragmentation Problem in Decision Tree Induction , 1997, ECML.

[11] Ron Kohavi,et al. Targeting Business Users with Decision Table Classifiers , 1998, KDD.

[12] Michael J. Pazzani. Learning with Globally Predictive Tests , 1998, Discovery Science.

[13] Ian H. Witten,et al. Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.