Real-time processing of IoT events with historic data using Apache Kafka and Apache Spark with dashing framework

IoT (Internet of Things) is a concept that broadens the idea of connecting multiple devices to each other over the Internet and enabling communication between these devices. Traditionally, the packets are sent over the network for communication only if both, the sender as well as the receiver, are online. This forces the sender and the receiver to be online 24×7; which is not achievable in each and every environment the devices communicates in. Considering the humongous data generated in the communication, it is necessary to store and process this data so that data insights can be identified to improve the organizational benefits. This generated data can be in two forms, real-time as well as existing or historical data. When this data is obtained in real-time and it is processed, even traditional big data technologies do not perform up to the mark. Hence to process this real-time data, streaming of this data is required; which is not a feature of traditional big data technologies. To achieve these objectives, the proposed architecture uses open source technologies such as Apache Kafka, for online and offline consumption of messages, and Apache Spark, to stream, process and provide a structure to the real-time and existing data. A framework known as Dashing is used to present the processed data in a more attractive and readable manner.

[1]  Vinayak Ashok Bharadi,et al.  Online Signature Recognition Using Software as a Service (SaaS) Model on Public Cloud , 2015, 2015 International Conference on Computing Communication Control and Automation.

[2]  Soonwook Hwang,et al.  KOHA: Building a Kafka-Based Distributed Queue System on the Fly in a Hadoop Cluster , 2016, 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W).

[3]  Vinayak Ashok Bharadi,et al.  Real-time processing of IoT events using a Software as a Service (SaaS) architecture with graph database , 2016, 2016 International Conference on Computing Communication Control and automation (ICCUBEA).

[4]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[5]  Geoffrey Fox,et al.  Architecture and measured characteristics of a cloud based internet of things , 2012, 2012 International Conference on Collaboration Technologies and Systems (CTS).