Analysis of Similarity Measures in Times Series Clustering for the Discovery of Building Energy Patterns

Forecasting and modeling building energy profiles require tools able to discover patterns within large amounts of collected information. Clustering is the main technique used to partition data into groups based on internal and a priori unknown schemes inherent of the data. The adjustment and parameterization of the whole clustering task is complex and submitted to several uncertainties, being the similarity metric one of the first decisions to be made in order to establish how the distance between two independent vectors must be measured. The present paper checks the effect of similarity measures in the application of clustering for discovering representatives in cases where correlation is supposed to be an important factor to consider, e.g., time series. This is a necessary step for the optimized design and development of efficient clustering-based models, predictors and controllers of time-dependent processes, e.g., building energy consumption patterns. In addition, clustered-vector balance is proposed as a validation technique to compare clustering performances.

[1]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Wolfgang Kastner,et al.  Clustering methods for occupancy prediction in smart home control , 2011, 2011 IEEE International Symposium on Industrial Electronics.

[4]  K. Steemers,et al.  A method of formulating energy load profile for domestic buildings in the UK , 2005 .

[5]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[6]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[7]  M. Cugmas,et al.  On comparing partitions , 2015 .

[8]  Chongqing Kang,et al.  Analysis on demand-side interactive response capability for power system dispatch in a smart grid framework☆ , 2012 .

[9]  Nashwan Dawood,et al.  Energy profiling in the life‐cycle assessment of buildings , 2010 .

[10]  Ding-Zhu Du,et al.  A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering , 2003, J. Glob. Optim..

[11]  Qian Weining,et al.  Analyzing Popular Clustering Algorithms from Different Viewpoints , 2002 .

[12]  Alexander Schliep,et al.  Comparative study on normalization procedures for cluster analysis of gene expression datasets , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[13]  Tingting Guo,et al.  Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques , 2011 .

[14]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[15]  Xiaoli Li,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 Classification of Energy Consumption in Buildings with Outlier Detection , 2022 .

[16]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[17]  A. H. Lipkus A proof of the triangle inequality for the Tanimoto distance , 1999 .

[18]  Chellu Chandra Sekhar,et al.  Local Density Estimation based Clustering , 2007, 2007 International Joint Conference on Neural Networks.

[19]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[20]  Ying-Yi Hong,et al.  Day-Ahead Electricity Price Forecasting Using a Hybrid Principal Component Analysis Network , 2012 .

[21]  Mikko Kolehmainen,et al.  Reducing energy consumption by using self-organizing maps to create more personalized electricity use information , 2008 .

[22]  F.M. Zedan,et al.  A Nonzero Sum Approach to Interactive Electricity Consumption , 2010, IEEE Transactions on Power Delivery.

[23]  Wolfgang Kastner,et al.  Usage profiles for sustainable buildings , 2010, 2010 IEEE 15th Conference on Emerging Technologies & Factory Automation (ETFA 2010).

[24]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[25]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[26]  D. Kirschen Demand-side view of electricity markets , 2003 .

[27]  Nima Amjady,et al.  Short-term hourly load forecasting using time-series modeling with peak load estimation capability , 2001 .

[28]  G. Chicco,et al.  Comparisons among clustering techniques for electricity customer classification , 2006, IEEE Transactions on Power Systems.

[29]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[30]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[31]  Richard C. Holt,et al.  Comparison of clustering algorithms in the context of software evolution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[32]  Georgios Zervas,et al.  The curse of dimensionality and document clustering , 1999 .

[33]  Wolfgang Kastner,et al.  Impact of user habits in smart home control , 2011, ETFA2011.

[34]  Qu Li,et al.  An Experimental Comparison of Three Kinds of Clustering Algorithms , 2005, 2005 International Conference on Neural Networks and Brain.

[35]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.