A Scalable Smart Meter Data Generator Using Spark

Today, smart meters are being used worldwide. As a matter of fact smart meters produce large volumes of data. Thus, it is important for smart meter data management and analytics systems to process petabytes of data. Benchmarking and testing of these systems require scalable data, however, it can be challenging to get large data sets due to privacy and/or data protection regulations. This paper presents a scalable smart meter data generator using Spark that can generate realistic data sets. The proposed data generator is based on a supervised machine learning method that can generate data of any size by using small data sets as seed. Moreover, the generator can preserve the characteristics of data with respect to consumption patterns and user groups. This paper evaluates the proposed data generator in a cluster based environment in order to validate its effectiveness and scalability.

[1]  Jie Chen,et al.  Finding the most appropriate precipitation probability distribution for stochastic weather generation and hydrological modelling in Nordic watersheds , 2013 .

[2]  Xiufeng Liu,et al.  A Prediction-Based Smart Meter Data Generator , 2016, 2016 19th International Conference on Network-Based Information Systems (NBiS).

[3]  Wolfgang Lehner,et al.  Template-based Time Series Generation with Loom , 2016, EDBT/ICDT Workshops.

[4]  Rob J Hyndman,et al.  25 years of time series forecasting , 2006 .

[5]  Ignacio Castillo Business Statistics for Contemporary Decision Making , 2014 .

[6]  Junjie Wu,et al.  Advances in K-means Clustering , 2012, Springer Theses.

[7]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[8]  Manish Marwah,et al.  IoTAbench: an Internet of Things Analytics Benchmark , 2015, ICPE.

[9]  Lukasz Golab,et al.  SMAS: A smart meter data analytics system , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  A. Shamshad,et al.  First and second order Markov chain models for synthetic generation of wind speed time series , 2005 .

[11]  Wojciech M. Golab,et al.  Benchmarking Smart Meter Data Analytics , 2015, EDBT.

[12]  R. Weiers Introduction to Business Statistics , 1991 .

[13]  Guoqiang Peter Zhang,et al.  Neural network forecasting for seasonal and trend time series , 2005, Eur. J. Oper. Res..

[14]  Lukasz Golab,et al.  Smart Meter Data Analytics , 2017, ACM Trans. Database Syst..

[15]  Korbinian Breinl,et al.  Simulating daily precipitation and temperature: a weather generation framework for assessing hydrometeorological hazards , 2015 .

[16]  A HC van Paassen,et al.  Weather data generator to study climate change on buildings , 2002 .

[17]  Mirek Riedewald,et al.  Processing theta-joins using MapReduce , 2011, SIGMOD '11.

[18]  Kai Zhang,et al.  Forecasting with prediction intervals for periodic autoregressive moving average models , 2013, Journal of time series analysis.

[19]  Yi Zhang,et al.  A two-stage pattern recognition method for electric customer classification in smart grid , 2016, 2016 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[20]  Junjie Wu,et al.  Advances in K-means clustering: a data mining thinking , 2012 .

[21]  Meikel Pöss,et al.  New TPC benchmarks for decision support and web commerce , 2000, SGMD.

[22]  Ronald K. Klimberg,et al.  Fundamentals of Forecasting Using Excel , 2009 .

[23]  Mahmoud Parsian,et al.  Data Algorithms: Recipes for Scaling Up with Hadoop and Spark , 2015 .

[24]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..