NexusDS: a flexible and extensible middleware for distributed stream processing

Techniques for efficient and distributed processing of huge, unbound data streams have made some impact in the database community. Sensors and data sources, such as position data of moving objects, continuously produce data that is consumed, e.g., by location-aware applications. Depending on the domain of interest, e.g. visualization, the processing of such data often depends on domain-specific functionality. This functionality is specified in terms of dedicated operators that may require specialized hardware, e.g. GPUs. This creates a strong dependency which a data stream processing system must consider when deploying such operators. Many data stream processing systems have been presented so far. However, these systems assume homogeneous computing nodes, do not consider operator deployment constraints, and are not designed to address domain-specific needs. In this paper, we identify necessary features that a flexible and extensible middleware for distributed stream processing of context data must satisfy. We present NexusDS, our approach to achieve these requirements. In NexusDS, data processing is specified by orchestrating data flow graphs, which are modeled as processing pipelines of predefined and general operators as well as custom-built and domain-specific ones. We focus on easy extensibility and support for domain-specific operators and services that may even utilize specific hardware available on dedicated computing nodes.

[1]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[2]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[3]  Elke A. Rundensteiner,et al.  Dynamic plan migration for continuous queries over data streams , 2004, SIGMOD '04.

[4]  Yin Yang,et al.  HybMig: A Hybrid Approach to Dynamic Plan Migration for Continuous Queries , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[6]  Ying Xing,et al.  Distributed operation in the Borealis stream processing engine , 2005, SIGMOD '05.

[7]  Bernhard Mitschang,et al.  A Model-Based, Open Architecture for Mobile, Spatially Aware Applications , 2001, SSTD.

[8]  Alfons Kemper,et al.  StreamGlobe: adaptive query processing and optimization in streaming P2P environments , 2004, DMSN '04.

[9]  Bernhard Mitschang,et al.  On building location aware applications using an open platform based on the NEXUS Augmented World Model , 2003, Software and Systems Modeling.

[10]  Bernhard Seeger,et al.  PIPES: a public infrastructure for processing and exploring streams , 2004, SIGMOD '04.

[11]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[12]  Philip J. Hatcher,et al.  Performance evaluation of JXTA communication layers , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[13]  Michael J. Franklin,et al.  Dynamic Pipeline Scheduling for Improving Interactive Query Performance , 2001, VLDB.

[14]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[15]  Michael Stonebraker,et al.  Load management and high availability in the Medusa distributed stream processing system , 2004, SIGMOD '04.

[16]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[17]  Rui Wang,et al.  Computation on Programmable Graphics Hardware , 2005, IEEE Computer Graphics and Applications.

[18]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[19]  S. Sitharama Iyengar,et al.  Adaptive visualization pipeline decomposition and mapping onto computer networks , 2004, Third International Conference on Image and Graphics (ICIG'04).

[20]  Stanley B. Zdonik,et al.  Dealing with Overload in Distributed Stream Processing Systems , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[21]  Michael Stonebraker,et al.  Operator Scheduling in a Data Stream Manager , 2003, VLDB.

[22]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[23]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[24]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[25]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[26]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[27]  Daniela Nicklas,et al.  Benefits of Integrating Meta Data into a Context Model , 2005, Third IEEE International Conference on Pervasive Computing and Communications Workshops.