Building characterization through smart meter data analytics: Determination of the most influential temporal and importance-in-prediction based features

Abstract The present paper aims at determining the most influential features to be extracted from smart meter data to facilitate machine learning-based classification of non-residential buildings. Smart meter-driven remote estimation of the chosen characteristics (the buildings’ performance class, use type, and operation group) is significantly helpful in buildings’ commissioning, benchmarking, and diagnostics applications. As the first step, state-of-the-art feature selection methods and a proposed customized approach are utilized for determining the most influential parameters in the pool of temporal features, proposed in a previous study. Next, importance-in-prediction based features, generated from an hour-ahead load prediction pipeline, that can improve the classification accuracy are proposed and added as additional input parameters. Finally, interpretations about some of the most influential features for different classification targets are provided. The obtained results demonstrate that, while aiming at estimating the buildings’ use type, through performing feature selection and adding importance-in-prediction based features, the number of utilized features is reduced from 290 (initial pool of features proposed in a previous study) to 29, while also increasing the accuracy from 71% to 74%. Similarly, number of employed features for estimating the performance class is decreased from 224 to 17 and the achieved accuracy is improved from 56% to 62%. Finally, using only 6 selected features, compared to 287 features in the initial set, the obtained accuracy for the classification of operation group is increased from 98% to 100%. It is thus demonstrated that the proposed methodology, through selecting and utilizing notably fewer features, results in a notable simplification of the feature extraction procedures, improves the achieved accuracy, and facilitates providing interpretations about the reason behind the influence of some of the most important features.

[1]  Behzad Najafi,et al.  Rapid Fault Diagnosis of PEM Fuel Cells through Optimal Electrochemical Impedance Spectroscopy Tests , 2020, Energies.

[2]  Lawrence Mosley,et al.  A balanced approach to the multi-class imbalance problem , 2013 .

[3]  W. Pirie Spearman Rank Correlation Coefficient , 2006 .

[4]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[5]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[6]  Eleni Mangina,et al.  Input variable selection for thermal load predictive models of commercial buildings , 2017 .

[7]  Alfonso Capozzoli,et al.  Recognition and classification of typical load profiles in buildings with non-intrusive learning approach , 2019 .

[8]  Jin Wen,et al.  A systematic feature selection procedure for short-term data-driven building energy forecasting model development , 2019, Energy and Buildings.

[9]  Johanna L. Mathieu,et al.  Quantifying Changes in Building Electricity Use, With Application to Demand Response , 2011, IEEE Transactions on Smart Grid.

[10]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[11]  Anuj Srivastava,et al.  Clustering Household Electrical Load Profiles Using Elastic Shape Analysis , 2019, 2019 IEEE Milan PowerTech.

[12]  Liu Yang,et al.  Building climate zoning in China using supervised classification-based machine learning , 2020 .

[13]  Clayton Miller,et al.  The Building Data Genome Project: An open, public data set from non-residential building electrical meters , 2017 .

[14]  Peter E. Latham,et al.  Mutual Information , 2006 .

[15]  Fred Pyrczak,et al.  Coefficient of Determination , 2018, Making Sense of Statistics.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Arno Schlueter,et al.  Unsupervised learning of energy signatures to identify the heating system and building type using smart meter data , 2020 .

[18]  Philip Price,et al.  Methods for Analyzing Electric Load Shape and its Variability , 2010 .

[19]  Clayton Miller,et al.  What's in the box?! Towards explainable machine learning applied to non-residential building smart meter classification , 2019, Energy and Buildings.

[20]  Sergey Malinchik,et al.  SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model , 2013, 2013 IEEE 13th International Conference on Data Mining.

[21]  Erwin Rose,et al.  Smart Meters and Federal Law: What Is the Role of Federal Law in the United States in the Deployment of Smart Electricity Metering? , 2014 .

[22]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[23]  Clayton Miller,et al.  Mining electrical meter data to predict principal building use, performance class, and operations strategy for hundreds of non-residential buildings , 2017 .

[24]  Zihao Wang,et al.  A review of data mining technologies in building energy systems: Load prediction, pattern identification, fault detection and diagnosis , 2020 .

[25]  Frédéric Magoulès,et al.  Feature Selection for Predicting Building Energy Consumption Based on Statistical Learning Method , 2012 .

[26]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Mikko Kolehmainen,et al.  Feature-Based Clustering for Electricity Use Time Series Data , 2009, ICANNGA.

[29]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[30]  Fabio Rinaldi,et al.  Machine Learning based Short-term Prediction of Air-conditioning Load through Smart Meter Analytics , 2017 .

[31]  J. Kelly Kissock,et al.  Measuring industrial energy savings , 2008 .

[32]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .

[33]  Fabio Rinaldi,et al.  Machine learning based disaggregation of air‐conditioning loads using smart meter data , 2020, IET Generation, Transmission & Distribution.

[34]  Tim Oates,et al.  GrammarViz 3.0 , 2018, ACM Trans. Knowl. Discov. Data.

[35]  Evan Mills Building commissioning: a golden opportunity for reducing energy costs and greenhouse gas emissions in the United States , 2011 .

[36]  Fabio Rinaldi,et al.  Data analytics for energy disaggregation: methods and applications , 2017 .