Induction with randomization testing: decision-oriented analysis of large data sets
暂无分享,去创建一个
Induction systems are computer-based tools that aid the construction of useful models from data. Existing systems are subject to overfitting--a tendency to produce models with unnecessary structure. Accurate statistical significance testing could prevent overfitting, but nearly all existing statistical significance tests are not appropriate for induction systems.
One approach, randomization testing, can be extended to meet the challenges posed by induction systems. Experiments indicate that a system with randomization testing can successfully combat overfitting. Models produced by the system are as accurate as, but significantly simpler than, models produced by other systems.