Processing high data rate streams in System S

High-performance stream processing is critical in many sense-and-respond application domains-from environmental monitoring to algorithmic trading. In this paper, we focus on language and runtime support for improving the performance of sense-and-respond applications in processing data from high-rate live streams. The central tenets of this work are the programming model, the workload splitting mechanisms, the code generation framework, and the underlying System S middleware and Spade programming model. We demonstrate considerable scalability behavior coupled with low processing latency in a real-world financial trading application.

[1]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[2]  Samuel P. Midkiff,et al.  Expressing and exploiting concurrency in networked applications with aspen , 2007, PPoPP.

[3]  Philip S. Yu,et al.  Challenges and Experience in Prototyping a Multi-Modal Stream Analytic and Monitoring Application on System S , 2007, VLDB.

[4]  Lisa Amini,et al.  Streamsight: a visualization tool for large-scale streaming applications , 2008, SoftVis '08.

[5]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[6]  Marionne Epalle Idea Flow - Sensing and Responding: Mani Chandy's Biologically Inspired Approach to Crisis Management , 2003 .

[7]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[8]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[9]  Song Liu,et al.  Load shedding in stream databases: a control-based approach , 2006, VLDB.

[10]  T. Kurc,et al.  Querying Very Large Multi-dimensional Datasets in ADR , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[11]  Jack Minker,et al.  Multiple Query Processing in Deductive Databases using Query Graphs , 1986, VLDB.

[12]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[13]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[14]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[15]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[16]  Navendu Jain,et al.  Design, implementation, and evaluation of the linear road bnchmark on the stream processing core , 2006, SIGMOD Conference.

[17]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[18]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[19]  Kun-Lung Wu,et al.  SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems , 2008, Middleware.

[20]  Kun-Lung Wu,et al.  A code generation approach to optimizing high-performance distributed data stream processing , 2009, CIKM.

[21]  S RosenblumDavid,et al.  Design and evaluation of a wide-area event notification service , 2001 .

[22]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[23]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[24]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.