Oblique random forest ensemble via Least Square Estimation for time series forecasting

Abstract Recent studies in Machine Learning indicates that the classifiers most likely to be the bests are the random forests. As an ensemble classifier, random forest combines multiple decision trees to significant decrease the overall variances. Conventional random forest employs orthogonal decision tree which selects one “optimal” feature to split the data instances within a non-leaf node according to impurity criteria such as Gini impurity, information gain and so on. However, orthogonal decision tree may fail to capture the geometrical structure of the data samples. Motivated by this, we make the first attempt to study the oblique random forest in the context of time series forecasting. In each node of the decision tree, instead of the single “optimal” feature based orthogonal classification algorithms used by standard random forest, a least square classifier is employed to perform partition. The proposed method is advantageous with respect to both efficiency and accuracy. We empirically evaluate the proposed method on eight generic time series datasets and five electricity load demand time series datasets from the Australian Energy Market Operator and compare with several other benchmark methods.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  M. Gilbert,et al.  Using Random Forest to Improve the Downscaling of Global Livestock Census Data , 2016, PloS one.

[3]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[5]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6]  Hironobu Fujiyoshi,et al.  To Be Bernoulli or to Be Gaussian, for a Restricted Boltzmann Machine , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  C. K. Chan,et al.  Prediction of hourly solar radiation using a novel hybrid model of ARMA and TDNN , 2011 .

[8]  Klaas Nicolay,et al.  Quantitative Multi-Parametric Magnetic Resonance Imaging of Tumor Response to Photodynamic Therapy , 2016, PLoS ONE.

[9]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[10]  Ponnuthurai N. Suganthan,et al.  Random Forests with ensemble of feature spaces , 2014, Pattern Recognit..

[11]  Bing Li,et al.  Comparison of random forests and other statistical methods for the prediction of lake water level: a case study of the Poyang Lake in China , 2016 .

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Le Zhang,et al.  Ensemble deep learning for regression and time series forecasting , 2014, 2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL).

[14]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Yuting Wang,et al.  Very Short-Term Load Forecasting: Wavelet Neural Networks With Data Pre-Filtering , 2013, IEEE Transactions on Power Systems.

[18]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[19]  Witold Pedrycz,et al.  Genetically optimized fuzzy decision trees , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Chia-Nan Ko,et al.  Short-term load forecasting using SVR (support vector regression)-based radial basis function neural network with dual extended Kalman filter , 2013 .

[21]  Timothy J. Hall,et al.  Performance of Observation-Based Prediction Algorithms for Very Short-Range, Probabilistic Clear-Sky Condition Forecasting , 2011 .

[22]  Ponnuthurai N. Suganthan,et al.  Random vector functional link network for short-term electricity load demand forecasting , 2016, Inf. Sci..

[23]  Ponnuthurai N. Suganthan,et al.  Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine , 2015, IEEE Transactions on Cybernetics.

[24]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[25]  Naresh Manwani,et al.  Geometric Decision Tree , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[27]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[28]  Ian Osband,et al.  Deep Learning for Time Series Modeling CS 229 Final Project Report , 2012 .

[29]  C. Tappert,et al.  A Genetic Algorithm for Constructing Compact Binary Decision Trees , 2009 .

[30]  Rasmus Berg Palm,et al.  Prediction as a candidate for learning deep hierarchical models of data , 2012 .

[31]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[32]  Wang Xiaolan,et al.  One-Month Ahead Prediction of Wind Speed and Output Power Based on EMD and LSSVM , 2009, 2009 International Conference on Energy and Environment Technology.

[33]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[34]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[35]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[36]  Jaeyoung Lee,et al.  Investigating macro-level hotzone identification and variable importance using big data: A random forest models approach , 2016, Neurocomputing.

[37]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[38]  Hyun Ah Song,et al.  Hierarchical Representation Using NMF , 2013, ICONIP.

[39]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[40]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[41]  Tomonobu Senjyu,et al.  A Novel Hybrid Approach Based on Wavelet Transform and Fuzzy ARTMAP Networks for Predicting Wind Farm Power Production , 2013 .

[42]  Jianzhou Wang,et al.  An adaptive fuzzy combination model based on self-organizing map and support vector regression for electric load forecasting , 2012 .

[43]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[44]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[45]  Ponnuthurai N. Suganthan,et al.  Towards generating random forests via extremely randomized trees , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[46]  L. Suganthi,et al.  Energy models for demand forecasting—A review , 2012 .

[47]  Dianhui Wang,et al.  Fast decorrelated neural network ensembles with random weights , 2014, Inf. Sci..

[48]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..