Noise reduction in regression tasks with distance, instance, attribute and density weighting

The idea presented in this paper is to gradually decrease the influence of selected training vectors on the model: if there is a higher probability that a given vector is an outlier, its influence on training the model should be limited. This approach can be used in two ways: in the input space (e.g. with such methods as k-NN for prediction and for instance selection) and in the output space (e.g. while calculating the error of an MLP neural network). The strong point of this gradual influence reduction is that it is not required to set a crisp outlier definition (outliers are difficult to be optimally defined). Moreover, according to the presented experimental results, this approach outperforms other methods while learning the model representation from noisy data.

[1]  Andrzej Rusiecki,et al.  Training Neural Networks on Noisy Data , 2014, ICAISC.

[2]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[3]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[4]  Marek Grochowski,et al.  Comparison of Instance Selection Algorithms II. Results and Comments , 2004, ICAISC.

[5]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[6]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[7]  Moumen T. El-Melegy,et al.  Robust Training of Artificial Feedforward Neural Networks , 2009, Foundations of Computational Intelligence.

[8]  Manuel Mucientes,et al.  An instance selection algorithm for regression and its application in variance reduction , 2013, 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[9]  Jianping Gou,et al.  A new distance-weighted k-nearest neighbor classifier , 2012 .

[10]  Ginés Rubio,et al.  Applying Mutual Information for Prototype or Instance Selection in Regression Problems , 2009, ESANN.

[11]  Andrzej Rusiecki,et al.  Robust learning algorithm based on LTA estimator , 2013, Neurocomputing.

[12]  Marcin Blachnik,et al.  A Hybrid System with Regression Trees in Steel-Making Process , 2011, HAIS.

[13]  Liu Ying Robust Learning Algorithm , 2001 .

[14]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[16]  Marcin Blachnik,et al.  Do We Need Whatever More Than k-NN? , 2010, ICAISC.

[17]  Rm Cameron-Jones,et al.  Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing , 1995 .

[18]  Mirosław Kordos,et al.  A survey of factors influencing MLP error surface , 2004 .

[19]  Marcin Blachnik,et al.  Instance Selection in Logical Rule Extraction for Regression Problems , 2013, ICAISC.

[20]  Moumen T. El-Melegy,et al.  Random Sampler M-Estimator Algorithm With Sequential Probability Ratio Test for Robust Function Approximation Via Feed-Forward Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[22]  J. Tolvi,et al.  Genetic algorithms for outlier detection and variable selection in linear regression models , 2004, Soft Comput..

[23]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .