Locally linear ensemble for regression

Abstract Considerable research effort has been dedicated to the development of prediction models for yielding greater prediction accuracy in regression problems. Although non-linear models have achieved superior prediction accuracy by addressing the non-linearity of complex data, linear models are still favored because of their high prediction speed. In this study, a locally linear ensemble regression (LLER) is proposed in order to effectively address non-linearity while maintaining the advantage of linear models. The LLER predicts new instances based on multiple linear models that are trained on the regions that identify the local linearity of data. To achieve this, data are decomposed into several locally linear regions based on an expectation-maximization procedure, and linear models are built as local experts for each region to constitute an ensemble. We demonstrate the effectiveness of the LLER through experimental validation with benchmark datasets.

[1]  Francesco Camastra,et al.  A Novel Kernel Method for Clustering , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Gian Luca Marcialis,et al.  A study on the performances of dynamic classifier selection based on local accuracy estimation , 2005, Pattern Recognit..

[3]  Lior Rokach,et al.  Decomposition methodology for classification tasks: a meta decomposer framework , 2006, Pattern Analysis and Applications.

[4]  Stefan Schaal,et al.  Local Adaptive Subspace Regression , 1998, Neural Processing Letters.

[5]  Inés María Galván,et al.  A Selective Learning Method to Improve the Generalization of Multilayer Feedforward Neural Networks , 2001, Int. J. Neural Syst..

[6]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[7]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[8]  Inés María Galván,et al.  LRBNN: A Lazy Radial Basis Neural Network model , 2007, AI Commun..

[9]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[10]  Bartosz Krawczyk,et al.  Selecting locally specialised classifiers for one-class classification ensembles , 2017, Pattern Analysis and Applications.

[11]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[12]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[13]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[14]  Grzegorz Dudek Pattern-based local linear regression models for short-term load forecasting , 2016 .

[15]  Sungzoon Cho,et al.  Approximating support vector machine with artificial neural network for fast prediction , 2014, Expert Syst. Appl..

[16]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[17]  Robert Sabourin,et al.  Dynamic selection approaches for multiple classifier systems , 2011, Neural Computing and Applications.

[18]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  George D. C. Cavalcanti,et al.  META-DES: A dynamic ensemble selection framework using meta-learning , 2015, Pattern Recognit..

[21]  Sungzoon Cho,et al.  Constructing a multi-class classifier using one-against-one approach with different binary classifiers , 2015, Neurocomputing.

[22]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[23]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[24]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[25]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Konstantinos G. Margaritis,et al.  Local learning regularization networks for localized regression , 2017, Neural Computing and Applications.

[27]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[28]  Lev V. Utkin,et al.  Improving over-fitting in ensemble regression by imprecise probabilities , 2015, Inf. Sci..

[29]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[30]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Kert Viele,et al.  Modeling with Mixtures of Linear Regressions , 2002, Stat. Comput..

[32]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[33]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[34]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[35]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Olivier Sigaud,et al.  Many regression algorithms, one unified model: A review , 2015, Neural Networks.

[37]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[38]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[41]  Lior Rokach,et al.  Improving Supervised Learning by Sample Decomposition , 2005, Int. J. Comput. Intell. Appl..

[42]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[43]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[44]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.