Refining classifier from unsampled data

For a learning task with a huge number of training instances, we sample some informative/important instances, which are then used for learning. Obtaining accurately labeling data is always difficult thus noise detection is required to filter out noises from sampled instances since the noises will degrade the learning performance. In this work, we propose to utilize unsampled instances to improve the performance of noise detection in sampled instances. Empirical study validates our idea that refined classifier can be achieved from noisy sampled instances by utilizing unsampled instances.

[1]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[2]  Lior Rokach,et al.  An Introduction to Decision Trees , 2007 .

[3]  Saso Dzeroski,et al.  Noise detection and elimination in data preprocessing: Experiments in medical domains , 2000, Appl. Artif. Intell..

[4]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[5]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[6]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Ray-I Chang,et al.  INTRUSION DETECTION BY BACKPROPAGATION NEURAL NETWORKS WITH SAMPLE-QUERY AND ATTRIBUTE-QUERY , 2007 .

[8]  Zhang Li,et al.  Training Samples Selection Method in Intrusion Detection System , 2008, 2008 International Symposium on Computer Science and Computational Technology.

[9]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[10]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[11]  Carla E. Brodley,et al.  Improving automated land cover mapping by identifying and eliminating mislabeled observations from training data , 1996, IGARSS '96. 1996 International Geoscience and Remote Sensing Symposium.

[12]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..