Randomized Decimation HyperPipes
暂无分享,去创建一个
paper represents an experimental investigation into the commonly asserted notion that the data mining algorithm "HyperPipes" works best on sparse data sets, i.e. datasets whose individual instances contain very few values in proportion to the number of attributes. To test this hypothesis, we have developed a tool, Randomized Decimation HyperPipes (RDH), which allows the user to adjust the level of sparseness of the training sets for datasets that would normally be considered full, i.e. entries existent for almost all attributes for every instance of the data. We then conduct 10-way, cross validated, experimental evaluations to measure the performance of HyperPipes on twenty-five different datasets. Our results show that the experiment provides information that can confirm the hypothesis pertaining to certain types of datasets. In datasets with certain dominant classes, RDH provides the best results when the training set is made very sparse. Analysis of our experimental results also consistently shows that, when using approximately three quarters of the data, selected by semi-intelligent randomization during training, our method worked the same or better than traditional HyperPipes in over sixty percent of our trials.
[1] Geoff Holmes,et al. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..
[2] Barry W. Boehm,et al. Finding the right data for software cost modeling , 2005, IEEE Software.