INforE: Interactive Cross-platform Analytics for Everyone

We present INforE, a prototype supporting non-expert programmers in performing optimized, cross-platform, streaming analytics at scale. INforE offers: a) a new extension to the RapidMiner Studio for graphical design of Big streaming Data workflows, (b) a novel optimizer to instruct the execution of workflows across Big Data platforms and clusters, (c) a synopses data engine for interactivity at scale via the use of data summaries, (d) a distributed, online data mining and machine learning module. To our knowledge INforE is the first holistic approach in streaming settings. We demonstrate INforE in the fields of life science and financial data analysis.

[1]  Rajeev Rastogi,et al.  Data Stream Management: A Brave New World , 2016, Data Stream Management.

[2]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[3]  Emmanuel Barillot,et al.  PhysiBoSS: a multi-scale agent-based modelling framework integrating physical dimension and cell signalling , 2018, bioRxiv.

[4]  Rajeev Rastogi,et al.  Data Stream Management , 2016, Data-Centric Systems and Applications.

[5]  András A. Benczúr,et al.  Online Machine Learning in Big Data Streams , 2018, Encyclopedia of Big Data Technologies.

[6]  Steven Hand,et al.  Musketeer: all for one, one for all in data processing systems , 2015, EuroSys.

[7]  Alfonso Valencia,et al.  Interactive Extreme: Scale Analytics Towards Battling Cancer , 2019, IEEE Technology and Society Magazine.

[8]  Nikos Giatrakos,et al.  A Synopses Data Engine for Interactive Extreme-Scale Analytics , 2020, CIKM.

[9]  Jeffrey Heer,et al.  Interactive analysis of big data , 2012, XRDS.

[10]  Dimitrios Tsoumakos,et al.  IReS: Intelligent, Multi-Engine Resource Scheduler for Big Data Analytics Workflows , 2015, SIGMOD Conference.

[11]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[12]  Nikos Giatrakos,et al.  Network-wide complex event processing over geographically distributed data sources , 2020, Inf. Syst..

[13]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[14]  Sanjay Chawla,et al.  RheemStudio: Cross-Platform Data Analytics Made Easy , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[15]  Michael Stonebraker,et al.  A Demonstration of the BigDAWG Polystore System , 2015, Proc. VLDB Endow..