Synthetic Data Generator for Classification Rules Learning

A standard data set is useful to empirically evaluate classification rules learning algorithms. However, there is still no standard data set which is common enough for various situations. Data sets from the real world are limited to specific applications. The sizes of attributes, the rules and samples of the real data are fixed. A data generator is proposed here to produce synthetic data set which can be as big as the experiments demand. The size of attributes, rules, and samples of the synthetic data sets can be easily changed to meet the demands of evaluation on different learning algorithms. In the generator, related attributes are created at first. And then, rules are created based on the attributes. Samples are produced following the rules. Three decision tree algorithms are evaluated used synthetic data sets produced by the proposed data generator.

[1]  Eunsung Lee,et al.  Exploring the Usefulness of a Decision Tree in Predicting People's Locations , 2014 .

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  Newton Spolaôr,et al.  A Framework to Generate Synthetic Multi-label Datasets , 2014, CLEI Selected Papers.

[4]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[5]  Thomas G. Dietterich Editorial Exploratory research in machine learning , 1990, Machine Learning.

[6]  A. Malliaris,et al.  What Drives Gold Returns? A Decision Tree Analysis , 2015 .

[7]  S. Kim,et al.  Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models , 2014 .

[8]  Yuan Yan Tang,et al.  Automatic decision support by information energy decision tree algorithm , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[9]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[10]  Jeffrey Scott Vitter,et al.  Scalable mining for classification rules in relational databases , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[11]  Bin Fang,et al.  Automatic decision support by rule exhaustion decision tree algorithm , 2016, 2016 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR).

[12]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.