Processing Data Streams with the RapidMiner Streams Plugin

In various applications we face a plethora of data that is often growing continuously. Such data arize in monitoring settings such as server log files, manufacturing processes, sensor networks or high volume news feeds such as twitter. Analysis of such data is different to the traditional batch setting that RapidMiner initially has been designed for. In this work we present the streams library – a simple and easy to use framework to continuously process streaming data. It comes with the Streams Plugin, integrating its streaming capabilities into the RapidMiner suite. We give an overview of the architecture of the streams library and its RapidMiner integration and demonstrate its usefulness for processing very large and continuous data in several use cases.

[1]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[2]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[3]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[4]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[5]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[6]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[7]  Wilfred Ng,et al.  Maintaining Frequent Itemsets over High-Speed Data Streams , 2006, PAKDD.

[8]  Toon Calders,et al.  Mining Frequent Itemsets in a Stream , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[11]  Christian Sohler,et al.  StreamKM++: A Clustering Algorithms for Data Streams , 2010, Workshop on Algorithm Engineering and Experimentation.

[12]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[13]  Zoltán Prekopcsák,et al.  Radoop: Analyzing Big Data with RapidMiner and Hadoop , 2011 .