A methodology for energy multivariate time series forecasting in smart buildings based on feature selection

Abstract The massive collection of data via emerging technologies like the Internet of Things (IoT) requires finding optimal ways to reduce the created features that have a potential impact on the information that can be extracted through the machine learning process. The mining of knowledge related to a concept is done on the basis of the features of data. The process of finding the best combination of features is called feature selection. In this paper we deal with multivariate time-dependent series of data points for energy forecasting in smart buildings. We propose a methodology to transform the time-dependent database into a structure that standard machine learning algorithms can process, and then, apply different types of feature selection methods for regression tasks. We used Weka for the tasks of database transformation, feature selection, regression, statistical test and forecasting. The proposed methodology improves MAE by 59.97% and RMSE by 40.75%, evaluated on training data, and it improves MAE by 42.28% and RMSE by 36.62%, evaluated on test data, on average for 1-step-ahead, 2-step-ahead and 3-step-ahead when compared to not applying any feature selection methodology.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[3]  A.F. Gomez-Skarmeta,et al.  An evolutionary algorithm for constrained multi-objective optimization , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[4]  Roland Schäfer,et al.  Accurate and efficient general-purpose boilerplate detection for crawled web corpora , 2017, Lang. Resour. Evaluation.

[5]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[6]  Ginés Rubio,et al.  Kernel Methods Applied to Time Series Forecasting , 2007, IWANN.

[7]  Fan Zhang,et al.  A review on time series forecasting techniques for building energy consumption , 2017 .

[8]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[9]  Antonio F. Gómez-Skarmeta,et al.  Data driven modeling for energy consumption prediction in smart buildings , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[10]  Shanlin Yang,et al.  Big data driven smart energy management: From big data to big insights , 2016 .

[11]  Fernando Jiménez,et al.  Attribute Selection Via Multi-Objective Evolutionary Computation Applied to Multi-Skill Contact Center Data Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[12]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[13]  Aurora González-Vidal,et al.  BEATS: Blocks of Eigenvalues Algorithm for Time Series Segmentation , 2018, IEEE Transactions on Knowledge and Data Engineering.

[14]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  Murat Akcin,et al.  Opportunities for energy efficiency in smart cities , 2016, 2016 4th International Istanbul Smart Grid Congress and Fair (ICSG).

[17]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[18]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[19]  H. Abdi,et al.  Principal component analysis , 2010 .

[20]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[21]  Xin Yan,et al.  Linear Regression Analysis: Theory and Computing , 2009 .

[22]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[23]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[24]  Irena Koprinska,et al.  Correlation and instance based feature selection for electricity load forecasting , 2015, Knowl. Based Syst..

[25]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[26]  Antonio F. Gómez-Skarmeta,et al.  Towards Energy Efficiency Smart Buildings Models Based on Intelligent Data Analytics , 2016, ANT/SEIT.

[27]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[28]  Nikolaos Kourentzes,et al.  Feature selection for time series prediction - A combined filter and wrapper approach for neural networks , 2010, Neurocomputing.

[29]  Ling Tang,et al.  Energy Time Series Data Analysis based on a Novel Integrated Data Characteristic Testing Approach , 2013, ITQM.

[30]  José Manuel Benítez,et al.  Feature Selection for Time Series Forecasting: A Case Study , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[31]  Lars Dannecker Energy Time Series Forecasting - Efficient and Accurate Forecasting of Evolving Time Series from the Energy Domain , 2015 .

[32]  David E. Claridge,et al.  Baselining methodology for facility-level monthly energy use. Part 1: Theoretical aspects , 1997 .

[33]  Fernando Jiménez,et al.  A Multi-Objective Evolutionary Approach for Fuzzy Optimization in Production Planning , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[34]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[35]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[36]  Li-Yeh Chuang,et al.  A Hybrid Both Filter and Wrapper Feature Selection Method for Microarray Classification , 2016, ArXiv.

[37]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[38]  Francisco Martínez-Álvarez,et al.  A Survey on Data Mining Techniques Applied to Electricity-Related Time Series Forecasting , 2015 .

[39]  Tao Li,et al.  Recent advances in feature selection and its applications , 2017, Knowledge and Information Systems.

[40]  Fernando Jiménez,et al.  Multi-objective evolutionary algorithms for fuzzy classification in survival prediction , 2014, Artif. Intell. Medicine.

[41]  Tetsuro Morimura,et al.  Temporal feature selection for time-series prediction , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[42]  Jie Zhang,et al.  A data-driven multi-model methodology with deep feature selection for short-term wind forecasting , 2017 .

[43]  Cyril Goutte,et al.  Note on Free Lunches and Cross-Validation , 1997, Neural Computation.

[44]  Frédéric Magoulès,et al.  Feature Selection for Predicting Building Energy Consumption Based on Statistical Learning Method , 2012 .

[45]  Özgür Kisi,et al.  Precipitation forecasting by using wavelet-support vector machine conjunction model , 2012, Eng. Appl. Artif. Intell..

[46]  Perica Strbac,et al.  Toward optimal feature selection using ranking methods and classification algorithms , 2011 .

[47]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[48]  Fernando Jiménez,et al.  Multi-objective evolutionary feature selection for online sales forecasting , 2017, Neurocomputing.

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[50]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[51]  Daniel O’Leary,et al.  Feature Selection and ANN Solar Power Prediction , 2017 .

[52]  K. Minu,et al.  Wavelet Neural Networks for Nonlinear Time Series Analysis , 2010 .

[53]  A. Karegowda,et al.  COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION , 2010 .

[54]  Victor I. Chang,et al.  Applicability of Big Data Techniques to Smart Cities Deployments , 2017, IEEE Transactions on Industrial Informatics.

[55]  Lipika Dey,et al.  A feature selection technique for classificatory analysis , 2005, Pattern Recognit. Lett..

[56]  Jiuyong Li,et al.  Using causal discovery for feature selection in multivariate numerical time series , 2015, Machine Learning.

[57]  Sancho Salcedo-Sanz,et al.  Feature selection in solar radiation prediction using bootstrapped SVRs , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[58]  R. K. Agrawal,et al.  An Introductory Study on Time Series Modeling and Forecasting , 2013, ArXiv.