Models and issues in data stream systems

In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.

[1]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[2]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[3]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[4]  Richard T. Snodgrass,et al.  A taxonomy of time databases , 1985, SIGMOD Conference.

[5]  Per-Åke Larson,et al.  Updating derived relations: detecting irrelevant and autonomously computable updates , 1986, VLDB.

[6]  P. Larson,et al.  Updating Derived Relations: Detecting Irrelevant and , 1986 .

[7]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[8]  Douglas Stott Parker,et al.  The Tangram stream query processing system , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[9]  Hamid Pirahesh,et al.  Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS , 1991, VLDB.

[10]  Patrick Valduriez,et al.  SVP: A Model Capturing Sets, Lists, Streams, and Parallelism , 1992, Very Large Data Bases Conference.

[11]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[12]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[13]  Miron Livny,et al.  Sequence query processing , 1994, SIGMOD '94.

[14]  Miron Livny,et al.  SEQ: A model for sequence databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[15]  Abraham Silberschatz,et al.  View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.

[16]  David B. Lomet,et al.  Bulletin of the Technical Committee on Data Engineering Special Issue on Data Reduction Techniques Announcements and Notices Letter from the Editor-in-chief 1 Technical Committee Election Changing Editorial Staa Letter from the Special Issue Editor the New Jersey Data Reduction Report , 2022 .

[17]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[18]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[19]  Miron Livny,et al.  The Design and Implementation of a Sequence Database System , 1996, VLDB.

[20]  Mark Sullivan,et al.  Tribeca: A Stream Database Manager for Network Traffic Analysis , 1996, VLDB.

[21]  Jennifer Widom,et al.  Making views self-maintainable for data warehousing , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[22]  H. V. Jagadish,et al.  Data Integration using Self-Maintainable Views , 1996, EDBT.

[23]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[24]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[25]  Eyal Kushilevitz,et al.  Communication Complexity , 1997, Adv. Comput..

[26]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[27]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[28]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[29]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[30]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[31]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.

[32]  Hector Garcia-Molina,et al.  Expiring Data in a Warehouse , 1998, VLDB.

[33]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[34]  Rajeev Motwani,et al.  Random sampling for histogram construction: how much is enough? , 1998, SIGMOD '98.

[35]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[36]  B. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.

[37]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures , 1999, External Memory Algorithms.

[38]  Yannis E. Ioannidis,et al.  Histogram-Based Approximation of Set-Valued Query-Answers , 1999, VLDB.

[39]  Joseph M. Hellerstein,et al.  Online Dynamic Reordering for Interactive Data Processing , 1999, VLDB.

[40]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[41]  Jeffrey Scott Vitter,et al.  DIMACS workshop on External memory algorithms and visualization , 1999 .

[42]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[43]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[44]  Rajeev Motwani,et al.  On Sampling and Relational Operators , 1999, IEEE Data Eng. Bull..

[45]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[46]  An Adaptive Query Execution System for Data Integration , 1999, SIGMOD Conference.

[47]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[48]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[49]  Rajeev Motwani,et al.  On random sampling over joins , 1999, SIGMOD '99.

[50]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[51]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[52]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[53]  Anne Rogers,et al.  Hancock: a language for extracting signatures from data streams , 2000, KDD '00.

[54]  Mehul A. Shah,et al.  Adaptive Query Processing: Technology in Evolution , 2000, IEEE Data Eng. Bull..

[55]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[56]  Vinayak R. Borkar,et al.  Automatically Extracting Structure from Free Text Addresses , 2000, IEEE Data Eng. Bull..

[57]  Mahesh Viswanathan,et al.  Testing and spot-checking of data streams (extended abstract) , 2000, ACM-SIAM Symposium on Discrete Algorithms.

[58]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[59]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[60]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[61]  David J. DeWitt,et al.  Architecting a Network Query Engine for Producing Partial Results , 2000, WebDB.

[62]  Viswanath Poosala,et al.  Congressional samples for approximate answering of group-by queries , 2000, SIGMOD '00.

[63]  Rajeev Motwani,et al.  Towards estimation error guarantees for distinct values , 2000, PODS.

[64]  M. Franklin,et al.  XJoin: A Reactively-Scheduled Pipelined Join Operator , 2000, IEEE Data Eng. Bull..

[65]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[66]  Jeffrey Scott Vitter,et al.  Dynamic Maintenance of Wavelet-Based Histograms , 2000, VLDB.

[67]  Mahesh Viswanathan,et al.  Testing and Spot-Checking of Data Streams , 2000, SODA '00.

[68]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[69]  Serge Abiteboul,et al.  Monitoring XML data on the Web , 2001, SIGMOD '01.

[70]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[71]  Surajit Chaudhuri,et al.  A robust, optimization-based approach for approximate answering of aggregate queries , 2001, SIGMOD '01.

[72]  Ravi Kumar,et al.  Sampling algorithms: lower bounds and applications , 2001, STOC '01.

[73]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[74]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[75]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[76]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[77]  A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries , 2001, SIGMOD Conference.

[78]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[79]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[80]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[81]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[82]  Srikanta Tirthapura,et al.  Estimating simple functions on the union of data streams , 2001, SPAA '01.

[83]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[84]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[85]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[86]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[87]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[88]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[89]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[90]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[91]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[92]  Sudipto Guha,et al.  Approximating a data stream for querying and estimation: algorithms and performance evaluation , 2002, Proceedings 18th International Conference on Data Engineering.

[93]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[94]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[95]  Michael E. Saks,et al.  Space lower bounds for distance approximation in the data stream model , 2002, STOC '02.