Tradeoff Algorithms in Streaming Models

In this report I will focus on the research activity carried on during the first two years of my PhD program, at the University of Rome “La Sapienza”, in the field of massive data sets, with particular emphasis on the streaming computational model: in this model data stored in external memory can be accessed only sequentially, in one or several passes, and is processed using an amount of internal memory that is small, compared to the size of the external memory. First several computational models for massive data sets will be introduced and motivated, then an overview of the results obtained within the framework of those models will be given, mainly relatively to a specific, but relevant class of problems (namely, graph problems). A presentation of the results obtained during these two years will follow, and then work in progress and future research directions will be outlined.

[1]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[2]  D. Knuth,et al.  Mathematics for the Analysis of Algorithms , 1999 .

[3]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[4]  Gary L. Miller,et al.  A Simple Randomized Parallel Algorithm for List-Ranking , 1990, Inf. Process. Lett..

[5]  Mihalis Yannakakis,et al.  High-Probability Parallel Transitive-Closure Algorithms , 1991, SIAM J. Comput..

[6]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[7]  Kurt Mehlhorn,et al.  A Lower Bound for the Nondeterministic Space Complexity of Context-Free Recognition , 1992, Inf. Process. Lett..

[8]  Monika Henzinger,et al.  Fully dynamic biconnectivity and transitive closure , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[10]  Michael Sipser,et al.  Monotone Separation of Logarithmic Space from Logarithmic Depth , 1995, J. Comput. Syst. Sci..

[11]  E. Kushilevitz,et al.  Communication Complexity: Basics , 1996 .

[12]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[13]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, ESA.

[14]  Andrew Heybey,et al.  Tribeca: A System for Managing Large Databases of Network Traffic , 1998, USENIX Annual Technical Conference.

[15]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[16]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[17]  Anna C. Gilbert,et al.  QuickSAND: Quick Summary and Analysis of Network Data , 2001 .

[18]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[19]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[20]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[21]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[22]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[23]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[24]  Matthias Ruhl,et al.  Efficient algorithms for new computational models , 2003 .

[25]  Lukasz Golab,et al.  Data Stream Management Issues { A Survey , 2003 .

[26]  Mayur Datar,et al.  On the streaming model augmented with a sorting primitive , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Jop F. Sibeyn External Connected Components , 2004, SWAT.

[28]  Andrew McGregor Finding Matchings in the Streaming Model , 2004 .

[29]  Joan Feigenbaum,et al.  Graph distances in the streaming model: the value of space , 2005, SODA '05.

[30]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[31]  Camil Demetrescu,et al.  Parallel Algorithms are Good for Streaming , 2006 .

[32]  Ulrich Meyer,et al.  A computational study of external-memory BFS algorithms , 2006, SODA '06.

[33]  Camil Demetrescu,et al.  Trading off space for passes in graph streaming problems , 2006, SODA 2006.

[34]  Peter Sanders,et al.  STXXL: standard template library for XXL data sets , 2008, Softw. Pract. Exp..