Big Data Stream Learning with SAMOA

Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[3]  Albert Bifet,et al.  Massive Online Analysis , 2009 .

[4]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[5]  Gianmarco De Francisci Morales SAMOA: a platform for mining big data streams , 2013, WWW '13 Companion.

[6]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[7]  João Gama,et al.  Distributed Adaptive Model Rules for mining big data streams , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[8]  Gianmarco De Francisci Morales,et al.  SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..

[9]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..