Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series

Abstract The investigation of the accuracy of methods employed to forecast agricultural commodities prices is an important area of study. In this context, the development of effective models is necessary. Regression ensembles can be used for this purpose. An ensemble is a set of combined models which act together to forecast a response variable with lower error. Faced with this, the general contribution of this work is to explore the predictive capability of regression ensembles by comparing ensembles among themselves, as well as with approaches that consider a single model (reference models) in the agribusiness area to forecast prices one month ahead. In this aspect, monthly time series referring to the price paid to producers in the state of Parana, Brazil for a 60 kg bag of soybean (case study 1) and wheat (case study 2) are used. The ensembles bagging (random forests — RF), boosting (gradient boosting machine — GBM and extreme gradient boosting machine — XGB), and stacking (STACK) are adopted. The support vector machine for regression (SVR), multilayer perceptron neural network (MLP) and K-nearest neighbors (KNN) are adopted as reference models. Performance measures such as mean absolute percentage error (MAPE), root mean squared error (RMSE), mean absolute error (MAE), and mean squared error (MSE) are used for models comparison. Friedman and Wilcoxon signed rank tests are applied to evaluate the models’ absolute percentage errors (APE). From the comparison of test set results, MAPE lower than 1% is observed for the best ensemble approaches. In this context, the XGB/STACK (Least Absolute Shrinkage and Selection Operator-KNN-XGB-SVR) and RF models showed better performance for short-term forecasting tasks for case studies 1 and 2, respectively. Better APE (statistically smaller) is observed for XGB/STACK and RF in relation to reference models. Besides that, approaches based on boosting are consistent, providing good results in both case studies. Alongside, a rank according to the performances is: XGB, GBM, RF, STACK, MLP, SVR and KNN. It can be concluded that the ensemble approach presents statistically significant gains, reducing prediction errors for the price series studied. The use of ensembles is recommended to forecast agricultural commodities prices one month ahead, since a more assertive performance is observed, which allows to increase the accuracy of the constructed model and reduce decision-making risk.

[1]  Jean-François Carpantier,et al.  Real exchanges rates, commodity prices and structural factors in developing countries , 2015 .

[2]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[3]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[4]  W. Richter,et al.  Aligning Profit Taxation with Value Creation , 2016 .

[5]  Steven Li,et al.  Realized volatility forecast of agricultural futures using the HAR models with bagging and combination approaches , 2017 .

[6]  Jun Lv,et al.  Performance Analysis of Four Decomposition-Ensemble Models for One-Day-Ahead Agricultural Commodity Futures Price Forecasting , 2017, Algorithms.

[7]  Álvaro Alonso,et al.  Regression tree ensembles for wind energy and solar radiation prediction , 2017, Neurocomputing.

[8]  Bao Yukun,et al.  An improved EEMD-based hybrid approach for the short-term forecasting of hog price in China , 2017 .

[9]  Héctor Allende,et al.  Ensemble Methods for Time Series Forecasting , 2017 .

[10]  Riza Demirer,et al.  Gold Futures Returns and Realized Moments: A Forecasting Experiment Using a Quantile-Boosting Approach , 2016, Resources Policy.

[11]  Ling Tang,et al.  A novel decomposition ensemble model with extended extreme learning machine for crude oil price forecasting , 2016, Eng. Appl. Artif. Intell..

[12]  Shuai Zhang,et al.  A novel ensemble method for credit scoring: Adaption of different imbalance ratios , 2018, Expert Syst. Appl..

[13]  Benito E. Flores,et al.  The utilization of the Wilcoxon test to compare forecasting methods: A note , 1989 .

[14]  José Luiz Parré,et al.  Especulação afeta o preço das commodities agrícolas , 2016 .

[15]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[16]  E. Slud,et al.  On goodness of fit of time series models: An application of higher order crossings , 1981 .

[17]  Kin Keung Lai,et al.  Ensemble forecasting of Value at Risk via Multi Resolution Analysis based methodology in metals markets , 2012, Expert Syst. Appl..

[18]  John P. Fulton,et al.  Integration of high resolution remotely sensed data and machine learning techniques for spatial prediction of soil properties and corn yield , 2018, Comput. Electron. Agric..

[19]  Christian Pierdzioch,et al.  Forecasting precious metal returns with multivariate random forests , 2017, Empirical Economics.

[20]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[21]  Michael D. Murphy,et al.  Machine-learning algorithms for predicting on-farm direct water and electricity consumption on pasture based dairy farms , 2018, Comput. Electron. Agric..

[22]  Christian Pierdzioch,et al.  A Boosting Approach to Forecasting the Volatility of Gold-Price Fluctuations Under Flexible Loss , 2015 .

[23]  Marjan Kaedi,et al.  Suspended sediment concentration estimation by stacking the genetic programming and neuro-fuzzy predictions , 2016, Appl. Soft Comput..

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  Shingo Mabu,et al.  Ensemble learning of rule-based evolutionary algorithm using multi-layer perceptron for supporting decisions in stock trading problems , 2015, Appl. Soft Comput..

[26]  Abdolrahman Peimankar,et al.  Multi-objective ensemble forecasting with an application to power transformers , 2018, Appl. Soft Comput..

[27]  Hamit Erdal,et al.  Bagging ensemble models for bank profitability: An emprical research on Turkish development and investment banks , 2016, Appl. Soft Comput..

[28]  Ronald Trostle,et al.  Global Agricultural Supply and Demand: Factors Contributing to the Recent Increase in Food Commodity Prices , 2012 .

[29]  Nicolas Huck,et al.  Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 , 2017, Eur. J. Oper. Res..

[30]  Ling Tang,et al.  LSSVR ensemble learning with uncertain parameters for crude oil price forecasting , 2017, Appl. Soft Comput..

[31]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[32]  Lin Lu,et al.  Macroeconomic indicators alone can predict the monthly closing price of major U.S. indices: Insights from artificial intelligence, time-series analysis and hybrid models , 2018, Appl. Soft Comput..

[33]  Francisco José Climent Diranzo,et al.  Predicting failure in the U.S. banking sector: An extreme gradient boosting approach , 2019, International Review of Economics & Finance.

[34]  Frank J. Fabozzi,et al.  Handbook of Finance , 2008 .

[35]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[36]  Ajith Abraham,et al.  Ensemble Neurocomputing Based Oil Price Prediction , 2014, AECIA.

[37]  Anthony Paris,et al.  On the link between oil and agricultural commodity prices: Do biofuels matter? , 2017, International Economics.

[38]  Luís Torgo,et al.  Arbitrated Ensemble for Time Series Forecasting , 2017, ECML/PKDD.

[39]  Luís Torgo,et al.  A Comparative Study of Performance Estimation Methods for Time Series Forecasting , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[40]  D. Assouline,et al.  Large-scale rooftop solar photovoltaic technical potential estimation using Random Forests , 2018 .

[41]  D. A. Bini,et al.  Correlação e causalidade entre os preços de commodities e energia , 2015 .

[42]  Yanru Zhang,et al.  A gradient boosting method to improve travel time prediction , 2015 .

[43]  Xiaojun Ma,et al.  Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning , 2018, Electron. Commer. Res. Appl..

[44]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[45]  Rubén Urraca,et al.  Stacking ensemble with parsimonious base models to improve generalization capability in the characterization of steel bolted components , 2018, Appl. Soft Comput..

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  Ponnuthurai Nagaratnam Suganthan,et al.  Ensemble methods for wind and solar power forecasting—A state-of-the-art review , 2015 .

[48]  Lin Lu,et al.  Predicting short-term stock prices using ensemble methods and online data sources , 2018, Expert Syst. Appl..

[49]  Henrik Madsen,et al.  Multi-site solar power forecasting using gradient boosted regression trees , 2017 .

[50]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[51]  Wenjing Duan,et al.  What matters for global food price volatility? , 2018 .

[52]  Lucilio Rogerio Aparecido Alves,et al.  Causalidade e transmissão entre os preços de mandioca, trigo, milho e seus derivados no Paraná , 2015 .

[53]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[54]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[55]  Christian Pierdzioch,et al.  Forecasting gold-price fluctuations: a real-time boosting approach , 2015 .

[56]  Ling Tang,et al.  A non-iterative decomposition-ensemble learning paradigm using RVFL network for crude oil price forecasting , 2017, Appl. Soft Comput..

[57]  Bart Frijns,et al.  Contemporaneous interactions among fuel, biofuel and agricultural commodities , 2016 .

[58]  Jianping Li,et al.  A deep learning ensemble approach for crude oil price forecasting , 2017 .

[59]  Federico Divina,et al.  Stacking Ensemble Learning for Short-Term Electricity Consumption Forecasting , 2018 .

[60]  N. Messikh,et al.  The use of a multilayer perceptron (MLP) for modelling the phenol removal by emulsion liquid membrane , 2017 .

[61]  T. Bakhshpoori,et al.  Improving the prediction of ground motion parameters based on an efficient bagging ensemble model of M5′ and CART algorithms , 2018, Appl. Soft Comput..

[62]  Sotirios Chatzis,et al.  A stacked generalization system for automated FOREX portfolio trading , 2017, Expert Syst. Appl..

[63]  Petter Næss,et al.  Applying gradient boosting decision trees to examine non-linear effects of the built environment on driving distance in Oslo , 2018 .

[64]  Ang Li,et al.  Stock trend prediction based on a new status box method and AdaBoost probabilistic support vector machine , 2016, Appl. Soft Comput..

[65]  Daniel F. Leite,et al.  Ensemble of evolving data clouds and fuzzy models for weather time series prediction , 2018, Appl. Soft Comput..

[66]  Jane Labadin,et al.  Applied Soft Computing , 2014 .

[67]  Carlos Eduardo Caldarelli,et al.  Fatores de influência no preço do milho no Brasil , 2012 .

[68]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[69]  Odilon José de Oliveira Neto,et al.  Volatilidade e Transmissão dos Preços Internacionais do Trigo para os Preços Domésticos do Trigo e Derivados no Brasil , 2018 .

[70]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[71]  Kelmara Mendes Vieira,et al.  Análise de Causalidade de Preços no Mercado Internacional da Soja O Caso do Brasil, Argentina e Estados Unidos , 2016 .

[72]  Carlos F.M. Coimbra,et al.  Assessment of machine learning techniques for deterministic and probabilistic intra-hour solar forecasts , 2018, Renewable Energy.

[73]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[74]  Junliang Fan,et al.  Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China , 2018 .

[75]  Rob J. Hyndman,et al.  A note on the validity of cross-validation for evaluating autoregressive time series prediction , 2018, Comput. Stat. Data Anal..

[76]  Sinézio Fernandes Maia,et al.  Os efeitos da taxa de câmbio e dos preços do petróleo nos preços internacionais das commodities brasileiras , 2017 .

[77]  Asifullah Khan,et al.  Wind power prediction using deep neural network based meta regression and transfer learning , 2017, Appl. Soft Comput..

[78]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[79]  Wei-Chiang Hong,et al.  Forecasting holiday daily tourist flow based on seasonal support vector regression with adaptive genetic algorithm , 2015, Appl. Soft Comput..

[80]  Deepak Kumar,et al.  A hybrid financial trading support system using multi-category classifiers and random forest , 2018, Appl. Soft Comput..

[81]  Anifowose Fatai,et al.  Investigating the effect of training–testing data stratification on the performance of soft computing techniques: an experimental study , 2017, J. Exp. Theor. Artif. Intell..

[82]  Yishan Ding,et al.  A novel decompose-ensemble methodology with AIC-ANN approach for crude oil forecasting , 2018, Energy.

[83]  Luís Torgo,et al.  Dynamic and Heterogeneous Ensembles for Time Series Forecasting , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[84]  Gorkem Serbes,et al.  An emboli detection system based on Dual Tree Complex Wavelet Transform and ensemble learning , 2015, Appl. Soft Comput..

[85]  Chen Wang,et al.  Improved v -Support vector regression model based on variable selection and brain storm optimization for stock price forecasting , 2016, Appl. Soft Comput..

[86]  Scott Gerlt,et al.  Automatic Responses of Crop Stocks and Policies Buffer Climate Change Effects on Crop Markets and Price Volatility , 2018, Ecological Economics.

[87]  Jessica Granderson,et al.  Gradient boosting machine for modeling the energy consumption of commercial buildings , 2018 .