Clustering for Demand Response An Online Algorithmic Approach

The widespread monitoring of electricity consumption due to increasingly pervasive deployment of networked sensors in urban environments has resulted in an unprecedentedly large volume of data being collected. To improve sustainability in Smart Grids, real-time data analytics challenges induced by high volume and high dimensional context-based data need to be addressed. Particularly, with the emerging Smart Grid technologies becoming more ubiquitous, analytics for discovering the underlying structure of high dimensional time series data are crucial to convert the massive amount of fine-grained energy information gathered from residential smart meters into appropriate Demand Response (DR) insights. In this paper, we propose an online time series clustering approach to systematically and efficiently manage the energy consumption data deluge, and also capture specific behavior i.e., identify households with similar lifestyle patterns. Customers can in this way be segmented into several groups that can be effectively used to enhance DR policies for real time automatic control in the cyberphysical Smart Grid system. Due to the inherent intractability of the ‘optimal clustering’ problem, we propose a novel randomized approximation clustering scheme of electricity consumption data, aiming at addressing three major issues: (i) designing a resource-constrained, online clustering technique for high volume, high dimensional time series data (ii) determining the optimal number of clusters that gives the best approximate clustering configuration, and (iii) providing strong clustering performance guarantees. By the term ‘performance guarantees’, we imply algorithm performance with respect to the best clustering possible for the given data. Our proposed online clustering algorithm is time efficient, achieves a clustering configuration that is optimal within provable worst case approximation factors, scales to large data sets, and is extensible to parallel and distributed architectures. The applicability of our algorithm goes beyond that of the Smart Grid and includes any scenario where clustering needs to be done on high volume and in real-time under space and time constraints. Keywords—time series, online clustering algorithm, real-time analytics, Smart Grid, demand response

[1]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[2]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[3]  R. J. Alcock,et al.  Time-Series Similarity Queries Employing a Feature-Based Approach , 1999 .

[4]  Jirí Matousek,et al.  On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[5]  Hesham K. Alfares,et al.  Electric load forecasting: Literature survey and classification of methods , 2002, Int. J. Syst. Sci..

[6]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[7]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[8]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[10]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[11]  C. Greg Plaxton,et al.  Optimal Time Bounds for Approximate Clustering , 2002, Machine Learning.

[12]  Amit Kumar,et al.  A simple linear time ( 1+ ε)- approximation algorithm for geometric k-means clustering in any dimensions , 2004 .

[13]  P. Postolache,et al.  Load pattern-based classification of electricity customers , 2004, IEEE Transactions on Power Systems.

[14]  Dimitrios Gunopulos,et al.  Iterative Incremental Clustering of Time Series , 2004, EDBT.

[15]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[16]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[17]  P. McSharry,et al.  A comparison of univariate methods for forecasting electricity demand up to a day ahead , 2006 .

[18]  G. Chicco,et al.  Comparisons among clustering techniques for electricity customer classification , 2006, IEEE Transactions on Power Systems.

[19]  R. Ostrovsky,et al.  The Effectiveness of Lloyd-Type Methods for the k-Means Problem , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20]  C. Senabre,et al.  Classification, Filtering, and Identification of Electrical Customer Load Patterns Through the Use of Self-Organizing Maps , 2006, IEEE Transactions on Power Systems.

[21]  N.D. Hatziargyriou,et al.  Two-Stage Pattern Recognition of Load Curves for Classification of Electricity Customers , 2007, IEEE Transactions on Power Systems.

[22]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[23]  Steven J. Moss,et al.  Market Segmentation and Energy Efficiency Program Design , 2008 .

[24]  Shai Ben-David,et al.  Relating Clustering Stability to Properties of Cluster Boundaries , 2008, COLT.

[25]  Satish Rao,et al.  Learning Mixtures of Product Distributions Using Correlations and Independence , 2008, COLT.

[26]  Nir Ailon,et al.  Streaming k-means approximation , 2009, NIPS.

[27]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[28]  Xinghuo Yu,et al.  The New Frontier of Smart Grids , 2011, IEEE Industrial Electronics Magazine.

[29]  Johanna L. Mathieu,et al.  Variability in automated responses of commercial buildings and industrial facilities to dynamic elec , 2011 .

[30]  Andrey Brito,et al.  Scalable and Low-Latency Data Processing with Stream MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[31]  Sarvapali D. Ramchurn,et al.  Putting the 'smarts' into the smart grid , 2012, Commun. ACM.

[32]  Juan Shishido,et al.  Smart Meter Data Quality Insights , 2012 .

[33]  Yingying Li,et al.  Research on incremental clustering , 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet).

[34]  Yik-Chung Wu,et al.  Load/Price Forecasting and Managing Demand Response for Smart Grids: Methodologies and Challenges , 2012, IEEE Signal Processing Magazine.

[35]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[36]  Ram Rajagopal,et al.  Utility customer segmentation based on smart meter data: Empirical study , 2013, 2013 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[37]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[38]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[39]  Fabio Roli,et al.  Is data clustering in adversarial settings secure? , 2013, AISec.

[40]  Yogesh L. Simmhan,et al.  Scalable prediction of energy consumption using incremental time series clustering , 2013, 2013 IEEE International Conference on Big Data.

[41]  Ram Rajagopal,et al.  Household Energy Consumption Segmentation Using Hourly Data , 2014, IEEE Transactions on Smart Grid.

[42]  Fabio Roli,et al.  Poisoning behavioral malware clustering , 2014, AISec '14.

[43]  Viktor K. Prasanna,et al.  Accurate and efficient selection of the best consumption prediction method in smart grids , 2014, 2014 IEEE International Conference on Big Data (Big Data).