Finding Electric Energy Consumption Patterns in Big Time Series Data

In recent years the available volume of information has grown considerably due to the development of new technologies such as the sensor networks or smart meters, and therefore, new algorithms able to deal with big data are necessary. In this work the distributed version of the k-means algorithm in the Apache Spark framework is proposed in order to find patterns from a big time series. Results corresponding to the electricity consumptions for years 2011, 2012 and 2013 for two buildings from a public university are presented and discussed. Finally, the performance of the proposed methodology in relation to the computational time is compared with that of Weka as benchmarking.

[1]  Eamonn J. Keogh,et al.  Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping , 2013, TKDD.

[2]  Belén Carro,et al.  Classification and Clustering of Electricity Demand Patterns in Industrial Parks , 2012 .

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  J. Razmi,et al.  Forecasting electricity consumption by clustering data in order to decline the periodic variable’s affects and simplification the pattern , 2009 .

[5]  Qiang Fu,et al.  YADING: Fast Clustering of Large-Scale Time Series Data , 2015, Proc. VLDB Endow..

[6]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[7]  Alicia Troncoso,et al.  Discovering patterns in electricity price using clustering techniques , 2007 .

[8]  María José del Jesús,et al.  Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..

[9]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[10]  J. V. van Wijk,et al.  Cluster and calendar based visualization of time series data , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[11]  Michael Minelli,et al.  Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses , 2012 .

[12]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[13]  José A. Lozano,et al.  A Recursive k-means Initialization Algorithm for Massive Data , 2015 .

[14]  Alicia Troncoso Lora,et al.  Partitioning-Clustering Techniques Applied to the Electricity Price Time Series , 2007, IDEAL.

[15]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.