A Novel Family of Boosted Online Regression Algorithms with Strong Theoretical Bounds

We investigate boosted online regression and propose a novel family of regression algorithms with strong theoretical bounds. In addition, we implement several variants of the proposed generic algorithm. We specifically provide theoretical bounds for the performance of our proposed algorithms that hold in a strong mathematical sense. We achieve guaranteed performance improvement over the conventional online regression methods without any statistical assumptions on the desired data or feature vectors. We demonstrate an intrinsic relationship, in terms of boosting, between the adaptive mixture-of-experts and data reuse algorithms. Furthermore, we introduce a boosting algorithm based on random updates that is significantly faster than the conventional boosting methods and other variants of our proposed algorithms while achieving an enhanced performance gain. Hence, the random updates method is specifically applicable to the fast and high dimensional streaming data. Specifically, we investigate Newton Method-based and Stochastic Gradient Descent-based linear regression algorithms in a mixture-of-experts setting and provide several variants of these well-known adaptation methods. However, the proposed algorithms can be extended to other base learners, e.g., nonlinear, tree-based piecewise linear. Furthermore, we provide theoretical bounds for the computational complexity of our proposed algorithms. We demonstrate substantial performance gains in terms of mean square error over the base learners through an extensive set of benchmark real data sets and simulated examples.

[1]  Haipeng Luo,et al.  Online Gradient Boosting , 2015, NIPS.

[2]  Paola Campadelli,et al.  A Boosting Algorithm for Regression , 1997, ICANN.

[3]  Shie Mannor,et al.  On the Existence of Linear Weak Learners and Applications to Boosting , 2002, Machine Learning.

[4]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[5]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[6]  Ya Zhang,et al.  Boosted multi-task learning , 2010, Machine Learning.

[7]  Rob J. Hyndman,et al.  Boosting multi-step autoregressive forecasts , 2014, ICML.

[8]  Andrew C. Singer,et al.  Universal linear least squares prediction: Upper and lower bounds , 2002, IEEE Trans. Inf. Theory.

[9]  Mihaela van der Schaar,et al.  Context-based unsupervised ensemble learning and feature ranking , 2016, Machine Learning.

[10]  Andrew C. Singer,et al.  Universal Switching Linear Least Squares Prediction , 2008, IEEE Transactions on Signal Processing.

[11]  Haipeng Luo,et al.  Optimal and Adaptive Algorithms for Online Boosting , 2015, ICML.

[12]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[13]  S. Wiggins Introduction to Applied Nonlinear Dynamical Systems and Chaos , 1989 .

[14]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT.

[15]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[16]  Paulo Sergio Ramirez,et al.  Fundamentals of Adaptive Filtering , 2002 .

[17]  N. Merhav,et al.  Universal Schemes for Sequential Decision from Individual Data Sequences , 1993, Proceedings. IEEE International Symposium on Information Theory.

[18]  Ali H. Sayed,et al.  Steady-State MSE Performance Analysis of Mixture Approaches to Adaptive Filtering , 2010, IEEE Transactions on Signal Processing.

[19]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[20]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[21]  Sebastian Nowozin,et al.  gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[22]  Ali H. Sayed,et al.  Combinations of Adaptive Filters: Performance and convergence properties , 2021, IEEE Signal Processing Magazine.

[23]  Hsuan-Tien Lin,et al.  An Online Boosting Algorithm with Theoretical Justifications , 2012, ICML.

[24]  Georg Zeitler,et al.  Universal Piecewise Linear Prediction Via Context Trees , 2007, IEEE Transactions on Signal Processing.

[25]  Yoram Singer,et al.  On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.

[26]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.

[27]  Penousal Machado,et al.  Progress in Artificial Intelligence: 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, September 8-11, 2015 , 2015 .

[28]  Durga L. Shrestha,et al.  Experiments with AdaBoost.RT, an Improved Boosting Scheme for Regression , 2006, Neural Computation.

[29]  Rong Jin,et al.  Multi-Class Learning by Smoothed Boosting , 2007, Machine Learning.

[30]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[31]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[32]  Souhaib Ben Taieb,et al.  A gradient boosting approach to the Kaggle load forecasting competition , 2014 .

[33]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[34]  Suleyman Serdar Kozat,et al.  A Comprehensive Approach to Universal Piecewise Nonlinear Regression Based on Trees , 2013, IEEE Transactions on Signal Processing.

[35]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[36]  Farhan Khan,et al.  Universal Nonlinear Regression on High Dimensional Data Using Adaptive Hierarchical Trees , 2016, IEEE Transactions on Big Data.

[37]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[38]  Piyush Malik,et al.  Governing Big Data: Principles and practices , 2013, IBM J. Res. Dev..

[39]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[40]  Shai Ben-David,et al.  Online learning versus offline learning , 1995, EuroCOLT.

[41]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[42]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[43]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[44]  Steven C. H. Hoi,et al.  Online Passive-Aggressive Active learning , 2016, Machine Learning.

[45]  Rocco A. Servedio,et al.  Smooth boosting and learning with malicious noise , 2003 .

[46]  Ming-Hsuan Yang,et al.  A family of online boosting algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[47]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.