Enhanced Data Modelling Approach with Interval Estimation

This paper deals with regression tasks on real-valued numerical data attributes. A special data transformation formulated in our earlier work is combined with a new and enhanced weighting strategy in order to improve prediction accuracy. The proposed data modelling approach offers several advantages: it does not depend on the particular regression model used and it enables the analyst to calculate tolerance interval estimates as well as the probability that the target attribute exceeds arbitrary predefined thresholds. We tested our approach on three real-world datasets. In all the three cases it reliably improved and stabilized the prediction accuracy (measured by the average root mean squared error for each dataset) as well as the quality of tolerance interval estimates.

[1]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[2]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[4]  Durga L. Shrestha,et al.  Machine learning approaches for estimation of prediction interval for the model output , 2006, Neural Networks.

[5]  Xin Yao,et al.  Evolving hybrid ensembles of learning machines for better generalisation , 2006, Neurocomputing.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Thomas Mathew,et al.  Statistical Tolerance Regions: Theory, Applications, and Computation , 2009 .

[8]  Peter Krammer,et al.  Advanced Data Integration and Data Mining for Enviromental Scenarios , 2010, 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[9]  Amir F. Atiya,et al.  Comprehensive Review of Neural Network-Based Prediction Intervals and New Advances , 2011, IEEE Transactions on Neural Networks.

[10]  Peter Krammer,et al.  Transformation regression technique for data mining , 2016, 2016 IEEE 20th Jubilee International Conference on Intelligent Engineering Systems (INES).

[11]  Peter Krammer,et al.  Improved regression method with interval estimation , 2017, 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).