Online Rebuilding Regression Random Forests

Abstract Continuous data streams mining is particularly challenging for machine learning. Many efforts have been devoted to propose online learning algorithms that can train iteratively from new coming data and provide evolutionary predictions. Compared to off-line approaches, these algorithms have shown better predictive performance and certain adaptation to high volume continuous data stream. However, a wide range of practical applications calls for regression models that can make adequate use of the large volume of pre-collected training data, meanwhile, handle continuous data stream with multi-type concept drifts, such as abrupt, gradual, incremental, recurring concept drifts. Random Forests(RFs) are an effective ensemble learning model for regression tasks. However, the fixed structure of RFs by off-line training has restricted its applicability for real-world tasks with dynamic data streams. To address these issues, we propose an online rebuilding strategy for the pre-trained Random Forests model, which is called Online Rebuilding Regression Random Forests(ORB-RRF). Specifically, a leaf-pruning technique and online reconstruction of subtrees based on the change of feature space on certain nodes are designed to adjust the local structure of regression tree to adapt to dynamic inputs. Numerical experiments with ORB-RRF show remarkable improvements in the adaptability in data stream and the predictive accuracy in several benchmark real datasets and synthetic datasets, compared to several state-of-art methods. Moreover, we show the convergence and stability of the proposed method.

[1]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[2]  Aoife Foley,et al.  Random Forest Based Approach for Concept Drift Handling , 2016, AIST.

[3]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[4]  Matthieu Guillaumin,et al.  Incremental Learning of Random Forests for Large-Scale Image Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rui Araújo,et al.  An on-line weighted ensemble of regressor models to handle concept drifts , 2015, Eng. Appl. Artif. Intell..

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Alípio Mário Jorge,et al.  The Effect of Varying Parameters and Focusing on Bus Travel Time Prediction , 2009, PAKDD.

[8]  Jean Paul Barddal,et al.  Adaptive random forests for data stream regression , 2018, ESANN.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Doyen Sahoo Online learning with nonlinear models , 2017 .

[11]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[12]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[13]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[14]  Tom Diethe,et al.  Online Learning with (Multiple) Kernels: A Review , 2013, Neural Computation.

[15]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[16]  Mahardhika Pratama,et al.  Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments , 2018, SDM.

[17]  Steven C. H. Hoi,et al.  Online Deep Learning: Learning Deep Neural Networks on the Fly , 2017, IJCAI.

[18]  Talel Abdessalem,et al.  Scikit-Multiflow: A Multi-output Streaming Framework , 2018, J. Mach. Learn. Res..

[19]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[20]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[21]  Hongyu Yang,et al.  Online random forests regression with memories , 2020, Knowl. Based Syst..

[22]  Luis M. Candanedo,et al.  Data driven prediction models of energy use of appliances in a low-energy house , 2017 .

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[25]  Yee Whye Teh,et al.  Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.

[26]  Mykola Pechenizkiy,et al.  Online mass flow prediction in CFB boilers with explicit detection of sudden concept drift , 2010, SKDD.

[27]  Saso Dzeroski,et al.  Online tree-based ensembles and option trees for regression on evolving data streams , 2015, Neurocomputing.

[28]  Leandro L. Minku,et al.  A heterogeneous online learning ensemble for non-stationary environments , 2020, Knowl. Based Syst..

[29]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[30]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31]  Andrew Phelps Cassidy,et al.  Calculating feature importance in data streams with concept drift using Online Random Forest , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[32]  T. Fearn,et al.  Classification and Regression Trees (CART) , 2020, Statistical Learning from a Regression Perspective.

[33]  E. Massera,et al.  On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario , 2008 .

[34]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[35]  Geoff Holmes,et al.  Improving Adaptive Bagging Methods for Evolving Data Streams , 2009, ACML.

[36]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[37]  Chyon-Hwa Yeh,et al.  Classification and regression trees (CART) , 1991 .

[38]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[39]  Sikun Li,et al.  An incremental extremely random forest classifier for online learning and tracking , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[40]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Weiping Ding,et al.  Automatic Construction of Multi-layer Perceptron Network from Streaming Examples , 2019, CIKM.

[42]  Narendra Ahuja,et al.  Robust Visual Tracking Using Oblique Random Forests , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  João Gama,et al.  Adaptive Model Rules From High-Speed Data Streams , 2014, BigMine.

[44]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).