External Memory Data Structures (invited Paper)

Many modern applications store and process datasets much larger than the main memory of even state-of-the-art high-end machines. Thus massive and dynamically changing datasets often need to be stored in data structures on external storage devices, and in such cases the Input/Output (or I/O) communication between internal and external memory can become a major performance bottleneck. In this paper we survey recent advances in the development of worst-case I/O-eecient external memory data structures.

[1]  Norbert Zeh,et al.  An External Memory Data Structure for Shortest Path Queries , 1999, COCOON.

[2]  Jeffrey Scott Vitter,et al.  Optimal dynamic interval management in external memory , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[3]  Mark H. Overmars,et al.  On the Equivalence of Some Rectangle Problems , 1982, Inf. Process. Lett..

[4]  Rakesh M. Verma,et al.  An Efficient Multiversion Access STructure , 1997, IEEE Trans. Knowl. Data Eng..

[5]  Dimitrios Gunopulos,et al.  On indexing mobile objects , 1999, PODS '99.

[6]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[7]  Norbert Zeh,et al.  External Memory Algorithms for Outerplanar Graphs , 1999, ISAAC.

[8]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[9]  J. Sack,et al.  Early Experiences in Implementing the Buffer Tree , 1997, WAE.

[10]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[11]  Leonidas J. Guibas,et al.  Data structures for mobile data , 1997, SODA '97.

[12]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[13]  Ulrich Meyer,et al.  An experimental study of priority queues in external memory , 1999, JEAL.

[14]  Desh Ranjan,et al.  Space-Filling Curves and Their Use in the Design of Geometric Data Structures , 1997, Theor. Comput. Sci..

[15]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[16]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[17]  Oliver Günther,et al.  The design of the cell tree: an object-oriented index structure for geometric databases , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[18]  Georgios Evangelidis,et al.  The hB $^\Pi$-tree: a multi-attribute index supporting concurrency, recovery and node consolidation , 1997, The VLDB Journal.

[19]  Gerth Stølting Brodal,et al.  Worst-Case Efficient External-Memory Priority Queues , 1998 .

[20]  D. E. Vengro,et al.  Eecient 3-d Range Searching in External Memory , 1995 .

[21]  Suresh Venkatasubramanian,et al.  On external memory graph traversal , 2000, SODA '00.

[22]  Bernhard Seeger,et al.  The bulk index join: a generic approach to processing non-equijoins , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[23]  Yi-Jen Chiang,et al.  I/O optimal isosurface extraction , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[24]  Joachim Gudmundsson,et al.  Box-Trees and R-Trees with Near-Optimal Query Time , 2001, SCG '01.

[25]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[26]  Yi-Jen Chiang,et al.  Dynamic and i/o-efficient algorithms for computational geometry and graph problems: theoretical and experimental results , 1995 .

[27]  Jeffrey Scott Vitter,et al.  I/O-e cient scienti c computation using TPIE , 1996 .

[28]  Jeffrey Scott Vitter,et al.  A Framework for Index Bulk Loading and Dynamization , 2001, ICALP.

[29]  N. Zeh I/O-Efficient Planar Separators and Applications , 2001 .

[30]  Norbert Zeh,et al.  I/O-Efficient Algorithms for Graphs of Bounded Treewidth , 2001, SODA '01.

[31]  Bernard Chazelle,et al.  Filtering search: A new approach to query-answering , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[32]  Kurt Mehlhorn,et al.  LEDA-SM Extending LEDA to Secondary Memory , 1999, Algorithm Engineering.

[33]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[34]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[35]  Diane Greene,et al.  An implementation and performance analysis of spatial data access methods , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[36]  Ulrich Meyer,et al.  On External-Memory Planar Depth First Search , 2001, J. Graph Algorithms Appl..

[37]  Sridhar Ramaswamy,et al.  The P-range tree: a new data structure for range searching in secondary memory , 1995, SODA '95.

[38]  Kurt Mehlhorn,et al.  LEDA-SM a platform for secondary memory computations , 1999 .

[39]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[40]  Ulrich Meyer External memory BFS on undirected graphs with bounded degree , 2001, SODA '01.

[41]  Lars Arge,et al.  The I/O - Complexity of Ordered Binary - Decision Diagram Manipulation , 1995, ISAAC.

[42]  E cient Bulk Operations on Dynamic R-trees , 1999 .

[43]  Martin J. Dürst,et al.  The design and analysis of spatial data structures. Applications of spatial data structures: computer graphics, image processing, and GIS , 1991 .

[44]  Rolf Klein,et al.  Priority Search Trees in Secondary Memory (Extended Abstract) , 1987, WG.

[45]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[46]  Sridhar Ramaswamy,et al.  Scalable Sweeping-Based Spatial Join , 1998, VLDB.

[47]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[48]  VitterJeffrey Scott,et al.  I/O-Efficient Algorithms for Problems on Grid-Based Terrains , 2001 .

[49]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[50]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[51]  Roberto Grossi,et al.  A fully-dynamic data structure for external substring search , 1995, STOC '95.

[52]  Lars Arge,et al.  On external-memory MST, SSSP and multi-way planar graph separation , 2000, J. Algorithms.

[53]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, Algorithmica.

[54]  H. Edelsbrunner A new approach to rectangle intersections part I , 1983 .

[55]  Kurt Mehlhorn,et al.  Randomized external-memory algorithms for some geometric problems , 1998, SCG '98.

[56]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[57]  Michael T. Goodrich,et al.  Topology B-Trees and Their Applications , 1995, WADS.

[58]  Peter Sanders,et al.  Fast priority queues for cached memory , 1999, JEAL.

[59]  Kurt Mehlhorn,et al.  A new data structure for representing sorted lists , 1980, Acta Informatica.

[60]  Maurizio Talamo,et al.  Orders, k-sets and Fast Halfplane Search on Paged Memory , 1994, ORDAL.

[61]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[62]  Ambuj K. Singh,et al.  Optimal Dynamic Range Searching in Non-replicating Index Structures , 1999, ICDT.

[63]  Jeffrey Scott Vitter,et al.  Online Data Structures in External Memory , 1999, WADS.

[64]  Norbert Zeh,et al.  An External Memory Data Structure for Shortest Path Queries , 1999, COCOON.

[65]  Robert E. Tarjan,et al.  Planar point location using persistent search trees , 1986, CACM.

[66]  Lars Arge,et al.  External Memory Data Structures , 2001, ESA.

[67]  S. Muthukrishnan,et al.  Overcoming the memory bottleneck in suffix tree construction , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[68]  Edward M. McCreight,et al.  Priority Search Trees , 1985, SIAM J. Comput..

[69]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[70]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[71]  T. M. Murali,et al.  I/O-efficient algorithms for contour-line extraction and planar graph blocking , 1998, SODA '98.

[72]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[73]  Fabrizio Luccio,et al.  Dynamic Dictionary Matching in External Memory , 1998, Inf. Comput..

[74]  Joachim Gudmundsson,et al.  On R-trees with Low Stabbing Number , 2000, ESA.

[75]  Roberto Grossi,et al.  Efficient Splitting and Merging Algorithms for Order Decomposable Problems , 1999, Inf. Comput..

[76]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[77]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[78]  L. Arge,et al.  I/o-eecient Dynamic Planar Point Location , 2000 .

[79]  P. Agarwal,et al.  Eecient Searching with Linear Constraints , 1997 .

[80]  Sridhar Ramaswamy,et al.  Path Caching: A Technique for Optimal External Searching , 1994, PODS 1994.

[81]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[82]  Klaus H. Hinrichs The grid file system: implementation and case studies of applications , 1985 .

[83]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[84]  Greg N. Frederickson,et al.  A data structure for dynamically maintaining rooted trees , 1997, SODA '93.

[85]  D. E. Vengro A transparent parallel I/O environment , 1994 .

[86]  Mark H. Overmars,et al.  The Design of Dynamic Data Structures , 1987, Lecture Notes in Computer Science.

[87]  Lars Arge,et al.  The Buuer Tree: a New Technique for Optimal I/o-algorithms ? , 1995 .

[88]  Leonidas J. Guibas,et al.  The power of geometric duality , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[89]  Yi-Jen Chiang,et al.  Experiments on the Practical I/O Efficiency of Geometric Algorithms: Distribution Sweep vs. Plane Sweep , 1995, WADS.

[90]  Dimitrios Gunopulos,et al.  Efficient computation of temporal aggregates with range predicates , 2001, PODS '01.

[91]  Paolo Ferragina,et al.  On Constructing Suffix Arrays in External Memory , 1999, ESA.

[92]  Timothy M. Chan Random Sampling, Halfspace Range Reporting, and Construction of (<= k)-Levels in Three Dimensions , 2000, SIAM J. Comput..

[93]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[94]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[95]  Daniel P. Miranker,et al.  A lower bound theorem for indexing schemes and its application to multidimensional range queries , 1998, PODS '98.

[96]  S. Rao Kosaraju,et al.  Algorithms for dynamic closest pair and n-body potential fields , 1995, SODA '95.

[97]  Kamesh Munagala,et al.  I/O-complexity of graph algorithms , 1999, SODA '99.

[98]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[99]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[100]  Mark H. Overmars,et al.  Batched Dynamic Solutions to Decomposable Searching Problems , 1985, J. Algorithms.

[101]  Bernard Chazelle,et al.  Lower bounds for orthogonal range searching: I. The reporting case , 1990, JACM.

[102]  ChazelleBernard Lower bounds for orthogonal range searching: I. The reporting case , 1990 .

[103]  Roberto Grossi,et al.  Fast string searching in secondary storage: theoretical developments and experimental results , 1996, SODA '96.

[104]  Kurt Mehlhorn,et al.  Dynamic point location in general subdivisions , 1992, SODA '92.

[105]  Klaus H. Hinrichs,et al.  Planar point location for large data sets: to seek or not to seek , 2000, JEAL.

[106]  Jyh-Jong Tsay,et al.  External-memory computational geometry , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[107]  Jeffrey Scott Vitter,et al.  On two-dimensional indexability and optimal range search indexing , 1999, PODS '99.

[108]  Alberto Marchetti-Spaccamela,et al.  Memory Paging for Connectivity and Path Problems in Graphs , 1993, J. Graph Algorithms Appl..

[109]  J. Vitter,et al.  On Sorting Strings in External Memory , 1997 .

[110]  Jon Louis Bentley,et al.  Decomposable Searching Problems , 1979, Inf. Process. Lett..

[111]  Jack A. Orenstein A comparison of spatial query processing techniques for native and parameter spaces , 1990, SIGMOD '90.

[112]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[113]  Cláudio T. Silva,et al.  External memory techniques for isosurface extraction in scientific visualization , 1998, External Memory Algorithms.

[114]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[115]  David Scot Taylor,et al.  Tight bounds for 2-dimensional indexing schemes , 1998, PODS '98.

[116]  Sridhar Ramaswamy,et al.  A Unified Approach for Indexed and Non-Indexed Spatial Joins , 2000, EDBT.

[117]  Sridhar Ramaswamy,et al.  Theory and practice of I/O-efficient algorithms for multidimensional batched searching problems , 1998, SODA '98.

[118]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[119]  Ralf Hartmut Güting,et al.  XP-Trees: External Priority Search Trees , 1990 .

[120]  Jürg Nievergelt,et al.  Spatial Data Structures: Concepts and Design Choices , 2000, Handbook of Computational Geometry.

[121]  Jeffrey Scott Vitter,et al.  I/O-efficient dynamic point location in monotone planar subdivisions , 1999, SODA '99.

[122]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[123]  Vijay Kumar,et al.  Improved algorithms and data structures for solving graph problems in external memory , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[124]  Jeffrey Scott Vitter,et al.  Flow computation on massive grids , 2001, GIS '01.

[125]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[126]  Dieter Pfoser,et al.  Novel Approaches to the Indexing of Moving Object Trajectories , 2000, VLDB.

[127]  Jukka Teuhola,et al.  Heaps and Heapsort on Secondary Storage , 1999, Theor. Comput. Sci..

[128]  Sridhar Ramaswamy,et al.  Indexing for Data Models with Constraints and Classes , 1996, J. Comput. Syst. Sci..

[129]  W. Schroeder,et al.  Interactive out-of-core isosurface extraction , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[130]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.