A Scalability Comparison Study of Data Management Approaches for Smart Metering Systems

Nowadays, more and more data are being generated and collected in electrical smart grids. Most of these data are coming from smart meters and sensors deployed massively throughout the power grid. As the generation of data is becoming ever more frequent and with the constantly increasing volumes, it is becoming harder and harder to manage and process these data at the scale of a smart grid within legacy systems. In this work, we focus on investigating the scalability and performance of different data management approaches for meter data processing. To this end, we conduct a thorough experimental study of various systems including a parallel relational database system, MapReduce based systems including Hadoop and Spark, and a NoSQL datastore system. Our experiment sets were conducted on up to 140 nodes on Grid5000 and up to 1.4 TB of meter data. Our results demonstrate that parallel relational systems are more suited for most processing types on smart meter data in the smart grid but at the cost of very slow data loading. In contrast, we show that with the appropriate distribution model, data partitioning and modeling choices we achieve very fast and scalable bill computations, the main complex processing for utilities providers.

[1]  Yu Yan,et al.  A distributed data storage and processing framework for next-generation residential distribution systems , 2014 .

[2]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[3]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[4]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[5]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[6]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  Ian Gorton,et al.  Toward Real Time Data Analysis for Smart Grids , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[11]  Lavanya Ramakrishnan,et al.  MARIANE: MApReduce Implementation Adapted for HPC Environments , 2011, 2011 IEEE/ACM 12th International Conference on Grid Computing.

[12]  GhemawatSanjay,et al.  The Google file system , 2003 .

[13]  Martin O'Halloran,et al.  A Comparison of MapReduce and Parallel Database Management Systems , 2013, ICONS 2013.

[14]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[15]  Xiufeng Liu,et al.  Streamlining Smart Meter Data Analytics , 2015 .

[16]  Rose Qingyang Hu,et al.  Scalable Distributed Communication Architectures to Support Advanced Metering Infrastructure in Smart Grid , 2012, IEEE Transactions on Parallel and Distributed Systems.

[17]  Chaitanya K. Baru,et al.  DB2 Parallel Edition , 1995, IBM Syst. J..

[18]  Volker Markl,et al.  "All roads lead to Rome": optimistic recovery for distributed iterative data processing , 2013, CIKM.

[19]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[20]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[21]  Sebnem Rusitschka,et al.  Smart Grid Data Cloud: A Model for Utilizing Cloud Computing in the Smart Grid Domain , 2010, 2010 First IEEE International Conference on Smart Grid Communications.

[22]  Wojciech M. Golab,et al.  Benchmarking Smart Meter Data Analytics , 2015, EDBT.