Graphical Spark Programming in IoT Mashup Tools

With the unprecedented rise in the number of IoT devices, the amount of data generated from sensors is huge and often demands an in-depth analysis to acquire suitable insights. Mashup tools, used primarily for intuitive graphical programming of IoT applications, can help both for efficiently prototyping and also data analytics pipelines. In this study, we focus on the tight integration of data analytics capabilities of Spark in IoT mashup tools. The main challenge in this direction is the presence of a wide range of data interfaces and APIs in the Spark ecosystem. In this study, we contribute to current applications by (i) providing a thorough analysis of the Spark ecosystem and selecting suitable data interfaces for use in a graphical flow-based programming paradigm, (ii) devising a novel, generic approach for programming Spark from graphical flows that comprises early-stage validation and code generation of Java Spark programs. The approach is implemented in aFlux, our JVM-based mashup tool and is evaluated in three use cases showcasing the machine learning and stream analytics capabilities of Spark.

[1]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[2]  Jaeho Kim,et al.  OpenIoT: An open service framework for the Internet of Things , 2014, 2014 IEEE World Forum on Internet of Things (WF-IoT).

[3]  Sasu Tarkoma,et al.  A gap analysis of Internet-of-Things platforms , 2015, Comput. Commun..

[4]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[5]  Christian Prehofer,et al.  Towards Integration of Big Data Analytics in Internet of Things Mashup Tools , 2016, WoT.

[6]  Reynold Xin,et al.  Apache Spark , 2016 .

[7]  Jerker Delsing,et al.  A survey of commercial frameworks for the Internet of Things , 2015, 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA).

[8]  Christian Prehofer,et al.  Stream Analytics in IoT Mashup Tools , 2018, 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Wagner Meira,et al.  Lemonade: A Scalable and Efficient Spark-Based Platform for Data Analytics , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[11]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[12]  Stephan Steglich,et al.  glue.things: a Mashup Platform for wiring the Internet of Things with the Internet of Services , 2014, WoT '14.

[13]  Kien A. Hua,et al.  ThingStore: a platform for internet-of-things application development and deployment , 2015, DEBS.

[14]  Antonio Pintus,et al.  Paraimpu: a platform for a social web of things , 2012, WWW.

[15]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[16]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[17]  Christian Prehofer,et al.  Modeling RESTful Web of Things Services: Concepts and Tools , 2017, Managing the Web of Things.