Randomization Techniques for Data Mining Methods
暂无分享,去创建一个
Data mining research has concentrated on inventing novel methods for finding interesting information from large masses of data. This has indeed led to many new computational tasks and some interesting algorithmic developments. However, there has been less emphasis on issues of significance testing of the discovered patterns or models. We discuss the issues in testing the results of data mining methods, and review some of the recent work in the development of scalable algorithmic techniques for randomization tests for data mining methods. We consider suitable null models and generation algorithms for randomization of 0-1 -matrices, arbitrary real valued matrices, and segmentations. We also discuss randomization for database queries.
[1] Aristides Gionis,et al. Assessing data mining results via swap randomization , 2007, TKDD.
[2] Heikki Mannila,et al. Comparing segmentations by applying randomization techniques , 2007, BMC Bioinformatics.
[3] Heikki Mannila,et al. Randomization of real-valued matrices for assessing the significance of data mining results , 2008, SDM.