Design and Implementation of a Scalable and QoS-aware Stream Processing Framework: The Quasit Prototype

Today's stream processing scenarios are characterized by large volumes of data, e.g., generated by cyber-physical systems in a smart city, on which continuous analysis tasks need to be performed, often with very different optimal trade-offs between achieved QoS and associated resource consumption. Here we present the novel Quasit model and framework offering runtime support to stream processing applications. Differently from existing literature, Quasit originally allows advanced QoS-based configuration, which can be used to finely tune the framework to fit highly different real-world situations. The paper describes the architecture and development of the Quasit prototype by offering interesting insights and lessons learned about the most important design/implementation choices made, such as the actor-based threading model, or the QoS enabled inter-process communication based on OMG DDS. The reported experimental results, measured over simple real test beds, show that our Quasit framework implementation can provide a good level of horizontal scalability with limited overhead and good exploitation of dynamically available processing resources.

[1]  Martin Odersky,et al.  An Overview of the Scala Programming Language , 2004 .

[2]  Martin Odersky,et al.  Scala Actors: Unifying thread-based and event-based programming , 2009, Theor. Comput. Sci..

[3]  Martin Odersky,et al.  Matching Objects with Patterns , 2007, ECOOP.

[4]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[5]  Ying Xing,et al.  Distributed operation in the Borealis stream processing engine , 2005, SIGMOD '05.

[6]  Paulo Marques,et al.  Flood: elastic streaming MapReduce , 2010, DEBS '10.

[7]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[8]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[11]  Paolo Bellavista,et al.  The QUASIT Model and Framework for Scalable Data Stream Processing with Quality of Service , 2012, MOBILWARE.

[12]  Ken Yocum,et al.  Ad-hoc data processing in the cloud , 2008, Proc. VLDB Endow..

[13]  James Horey,et al.  A programming framework for integrating web-based spatiotemporal sensor data with MapReduce capabilities , 2010, IWGS '10.

[14]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[15]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[16]  Robert Hundt,et al.  Loop Recognition in C++/Java/Go/Scala , 2011 .

[17]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[18]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[19]  Kun-Lung Wu,et al.  DEDUCE: at the intersection of MapReduce and stream processing , 2010, EDBT '10.

[20]  G. Pardo-Castellote,et al.  OMG data distribution service: architectural overview , 2003, IEEE Military Communications Conference, 2003. MILCOM 2003..

[21]  Bugra Gedik,et al.  A model‐based framework for building extensible, high performance stream processing middleware and programming language for IBM InfoSphere Streams , 2012, Softw. Pract. Exp..