Data Stream Management

In some domains, data arrives so fast and in such great quantity that storing it in a database collection is simply infeasible. When the incoming data relates to ongoing (real-world) events that require immediate action, persistence may further not even be useful; for example, data in electronic trading, network monitoring, or real-time fraud detection is only valuable for a short amount of time and therefore has to be utilized immediately. To adapt to these circumstances, data stream management systems (DSMSs) introduce the data stream as an abstraction for an infinite sequence of database records that arrive over time. The raw data streams arriving at the systems are usually referred to as base streams, whereas those resulting from data transformations (e.g. queries) are called derived streams. Since a data stream is impossible to store entirely due to its unbounded nature, DSMSs drop the database requirement of eternal data persistence: They retain incoming records for limited time only and eventually discard them.

[1]  Divyakant Agrawal,et al.  Duplicate detection in click streams , 2005, WWW '05.

[2]  Yufei Tao,et al.  Maintaining sliding window skylines on data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jürgen Dunkel,et al.  Complex Event Processing , 2015 .

[4]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[5]  Michael Stonebraker,et al.  The Aurora and Borealis Stream Processing Engines , 2016, Data Stream Management.

[6]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[7]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[8]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[9]  Timos K. Sellis,et al.  Window Specification over Data Streams , 2006, EDBT Workshops.

[10]  Phillip B. Gibbons Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports , 2001, VLDB.

[11]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[12]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[13]  Lukasz Golab,et al.  Sliding Window Query Processing over Data Streams , 2006 .

[14]  Ying Li,et al.  Microsoft CEP Server and Online Behavioral Targeting , 2009, Proc. VLDB Endow..

[15]  M. Hemalatha,et al.  Load shedding techniques based on windows in data stream systems , 2012, 2012 International Conference on Emerging Trends in Science, Engineering and Technology (INCOSET).

[16]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[17]  Seif Haridi,et al.  Large-Scale Data Stream Processing Systems , 2017, Handbook of Big Data Technologies.

[18]  Walid G. Aref,et al.  Incremental Evaluation of Sliding-Window Queries over Data Streams , 2007 .

[19]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[20]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[21]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[22]  Jürgen Dunkel,et al.  Complex Event Processing: Komplexe Analyse von massiven Datenströmen mit CEP , 2015 .

[23]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[24]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[25]  Mauricio Arango,et al.  Mobile QoS management using complex event processing: (industry article) , 2013, DEBS.

[26]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[27]  Michael ten Hompel,et al.  Towards Agile and Flexible Air Cargo Processes with Localization Based on RFID and Complex Event Processing , 2012, LDIC.

[28]  Michael Stonebraker,et al.  S-Store: A Streaming NewSQL System for Big Velocity Applications , 2014, Proc. VLDB Endow..

[29]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[30]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[31]  Jennifer Widom,et al.  Exploiting k-constraints to reduce memory overhead in continuous queries over data streams , 2004, TODS.

[32]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[33]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[34]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[35]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[36]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[37]  Gene Pang,et al.  Scalable Transactions for Scalable Distributed Database Systems , 2015 .

[38]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[39]  Theodore Johnson,et al.  A Heartbeat Mechanism and Its Application in Gigascope , 2005, VLDB.

[40]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[41]  Massimo Ficco,et al.  A Generic Intrusion Detection and Diagnoser System Based on Complex Event Processing , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[42]  Walid G. Aref,et al.  Exploiting predicate-window semantics over data streams , 2006, SGMD.

[43]  Srikanta Tirthapura,et al.  Range-efficient computation of F/sub 0/ over massive data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[44]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[45]  Konstantinos Vandikas,et al.  Towards Highly Available Complex Event Processing Deployments in the Cloud , 2013, 2013 Seventh International Conference on Next Generation Mobile Apps, Services and Technologies.

[46]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[47]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[48]  Hongjun Lu,et al.  Continuously maintaining quantile summaries of the most recent N elements over a data stream , 2004, Proceedings. 20th International Conference on Data Engineering.

[49]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[50]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[51]  Leonidas Fegaras,et al.  Incremental Query Processing on Big Data Streams , 2015, IEEE Transactions on Knowledge and Data Engineering.

[52]  Ioana Manolescu,et al.  Delta: Scalable Data Dissemination under Capacity Constraints , 2013, Proc. VLDB Endow..

[53]  Stanley B. Zdonik,et al.  Revision Processing in a Stream Processing Engine: A High-Level Design , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[54]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[55]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[56]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[57]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[58]  Rajeev Motwani,et al.  Load Shedding in Data Stream Systems , 2007, Data Streams - Models and Algorithms.

[59]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[60]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.