Aggregate Computation over Data Streams

Nowadays, we have witnessed the widely recognized phenomenon of high speed data streams. Various statistics computation over data streams is often required by many applications, including processing of relational type queries, data mining and high speed network management. In this paper, we provide survey for three important kinds of aggregate computations over data streams: frequency moment, frequency count and order statistic.

[1]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[3]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[4]  Stéphane Bressan,et al.  Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web , 2003, Lecture Notes in Computer Science.

[5]  Jiawei Han,et al.  Efficient Processing of Ranked Queries with Sweeping Selection , 2005, PKDD.

[6]  Jiawei Han,et al.  Towards robust indexing for ranked queries , 2006, VLDB.

[7]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[8]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[9]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[10]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.

[11]  Andrzej Pelc,et al.  Deterministic Rendezvous in Graphs , 2003 .

[12]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[13]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[14]  Phillip B. Gibbons Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports , 2001, VLDB.

[15]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[16]  Ravi Kumar,et al.  An improved data stream algorithm for frequency moments , 2004, SODA '04.

[17]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[18]  Yuguo Chen,et al.  Efficient maintenance of materialized top-k views , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[20]  Sanjeev Khanna,et al.  Power-conserving computation of order-statistics over sensor networks , 2004, PODS.

[21]  Hongjun Lu,et al.  Approximate processing of massive continuous quantile queries over high-speed data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[22]  Divesh Srivastava,et al.  Ranked join indices , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  Rajeev Raman,et al.  Algorithms — ESA 2002 , 2002, Lecture Notes in Computer Science.

[24]  Thomas Lukasiewicz Proceedings of the 7th International Symposium on the Foundations of Information and Knowledge Systems‚ FoIKS 2012‚ Kiel‚ Germany‚ March 5−9‚ 2012 , 2000 .

[25]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[26]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[27]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[28]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[29]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[30]  Divesh Srivastava,et al.  Effective computation of biased quantiles over data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[31]  Divyakant Agrawal,et al.  Fast Algorithms for Heavy Distinct Hitters using Associative Memories , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[32]  Divesh Srivastava,et al.  Space- and time-efficient deterministic algorithms for biased quantiles over data streams , 2006, PODS.

[33]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[34]  Yannis E. Ioannidis,et al.  Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing , 1996, VLDB.

[35]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[36]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[37]  Martín Farach-Colton LATIN 2004: Theoretical Informatics , 2004, Lecture Notes in Computer Science.

[38]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[39]  Divesh Srivastava,et al.  Reverse Nearest Neighbor Aggregates Over Data Streams , 2002, VLDB.

[40]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[41]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[42]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[43]  Srinivasan Seshan,et al.  Synopsis diffusion for robust aggregation in sensor networks , 2004, SenSys '04.

[44]  Divyakant Agrawal,et al.  Fast data stream algorithms using associative memories , 2007, SIGMOD '07.

[45]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[46]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[47]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[48]  Marios Hadjieleftheriou,et al.  Robust Sketching and Aggregation of Distributed Data Streams , 2005 .

[49]  Srikanta Tirthapura,et al.  Range-efficient computation of F/sub 0/ over massive data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[50]  Dawn Xiaodong Song,et al.  New Streaming Algorithms for Fast Detection of Superspreaders , 2005, NDSS.

[51]  Jeffrey Xu Yu,et al.  Summarizing Order Statistics over Data Streams with Duplicates , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[52]  Csaba D. Tóth,et al.  Space complexity of hierarchical heavy hitters in multi-dimensional data streams , 2005, PODS '05.

[53]  Yufei Tao,et al.  Branch-and-bound processing of ranked queries , 2007, Inf. Syst..

[54]  Simone Manganelli,et al.  Value at Risk Models in Finance , 2001, SSRN Electronic Journal.

[55]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[56]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[57]  Hongjun Lu,et al.  Continuously maintaining quantile summaries of the most recent N elements over a data stream , 2004, Proceedings. 20th International Conference on Data Engineering.

[58]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[59]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[60]  Graham Cormode,et al.  What’s Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[61]  Jian Xu,et al.  Space-efficient Relative Error Order Sketch over Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[62]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[63]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[64]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[65]  Divyakant Agrawal,et al.  Medians and beyond: new aggregation techniques for sensor networks , 2004, SenSys '04.

[66]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2004, Theory of Computing Systems.

[67]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[68]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[69]  Srikanta Tirthapura,et al.  Range Efficient Computation of F0 over Massive Data Streams , 2005, ICDE.

[70]  Yufei Tao,et al.  Processing Ranked Queries with the Minimum Space , 2006, FoIKS.

[71]  Ying Xing,et al.  Distributed operation in the Borealis stream processing engine , 2005, SIGMOD '05.

[72]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[73]  M - Estimating Aggregates on a Peer-to-Peer Network , 2003 .

[74]  Ravi Kumar,et al.  Approximate counting of inversions in a data stream , 2002, STOC '02.

[75]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[76]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[77]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[78]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[79]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[80]  Dimitrios Gunopulos,et al.  Ad-hoc Top-k Query Answering for Data Streams , 2007, VLDB.

[81]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[82]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[83]  Graham Cormode,et al.  Holistic aggregates in a networked world: distributed tracking of approximate quantiles , 2005, SIGMOD '05.

[84]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[85]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[86]  Graham Cormode,et al.  On Estimating Frequency Moments of Data Streams , 2007, APPROX-RANDOM.

[87]  Philippe Flajolet,et al.  Loglog Counting of Large Cardinalities (Extended Abstract) , 2003, ESA.

[88]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[89]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[90]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[91]  Suman Nath,et al.  Tributaries and deltas: efficient and robust aggregation in sensor network streams , 2005, SIGMOD '05.

[92]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[93]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[94]  Luís Torgo,et al.  Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings , 2005, PKDD.

[95]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.

[96]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[97]  Sudipto Guha,et al.  Approximate quantiles and the order of the stream , 2006, PODS.

[98]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[99]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[100]  Srikanta Tirthapura,et al.  Estimating simple functions on the union of data streams , 2001, SPAA '01.

[101]  Thomas Eiter,et al.  Database Theory - Icdt 2005 , 2008 .

[102]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[103]  Anupam Gupta,et al.  Counting inversions in lists , 2003, SODA '03.

[104]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[105]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..