An End to End Real Time Architecture for Analyzing and Clustering Time Series Data: Case of an Energy Management System

Big data is a field that fascinated many researchers from different areas to study intelligent and robust techniques to analyze extremely large data sets, reveal patterns about human behaviors and then make important decisions such as predicting the next human activity. Smart grid is a cyber physical system that aims at updating the current old-fashion electrical grid by incorporating the latest ICTs to improve the generation, the distribution and the consumption of electricity. The use of sensors in smart grids becomes crucial as it allows an energy management system to analyze massive generated sensory data and use machine learning algorithms to take advantage of the customer’s participation to reduce the cost of power. However, big data field revealed a very long list of tools to analyze data either in real time or batch modes so the decision of what tools to use for a particular case becomes a challenging one. The purpose of this paper is to present an end-to – end architecture for a real time test bed implemented at Al Akhawayn University in Ifrane Morocco to analyze and cluster time series sensor data using an IoT architecture composed of Kaa (a middleware), Kafka (a realtime data messaging queue), Spark (an in-memory data analytics platform) and k-means (a clustering algorithm).

[1]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[2]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[3]  Guangchi Liu,et al.  Big data machine learning using apache spark MLlib , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  Mahmoud Elkhodr,et al.  A Middleware for the Internet of Things , 2016, ArXiv.

[5]  Claude Tadonki,et al.  Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks , 2018, Concurr. Comput. Pract. Exp..

[6]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[7]  António Pereira,et al.  Big Data Analytics in IOT: Challenges, Open Research Issues and Tools , 2018, WorldCIST.

[8]  Abdulsalam Yassine,et al.  Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting , 2018 .

[9]  Xia Liu,et al.  A Survey of Distributed Message Broker Queues , 2017, ArXiv.

[10]  T. Mohana Priya,et al.  An Optimized repartitioned K-means Cluster algorithm using MapReduce Techniques for Big Data analysis-IJAERD , 2017 .

[11]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[12]  Mohamed Essaaidi,et al.  Smart campus microgrid: Advantages and the main architectural components , 2015, 2015 3rd International Renewable and Sustainable Energy Conference (IRSEC).

[13]  Mohammed Essaaidi,et al.  Smart campus energy management system: advantages, architectures, and the impact of using cloud computing , 2017, ICSDE.

[14]  Partha Pratim Ray,et al.  A survey of IoT cloud platforms , 2016 .

[15]  Qiang Fu,et al.  YADING: Fast Clustering of Large-Scale Time Series Data , 2015, Proc. VLDB Endow..

[16]  José Cristóbal Riquelme Santos,et al.  An approach to validity indices for clustering techniques in Big Data , 2018, Progress in Artificial Intelligence.

[17]  José Antonio Lozano,et al.  An efficient approximation to the K-means clustering for massive data , 2017, Knowl. Based Syst..