Online Data Structures in External Memory

The data sets for many of today's computer applications are too large to fit within the computer's internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal memories. In this paper we discuss a variety of on-line data structures for external memory--some very old and some very new--such as hashing (for dictionaries), B-trees (for dictionaries and 1-D range search), buffer trees (for batched dynamic problems), interval trees with weight-balanced B-trees (for stabbing queries), priority search trees (for 3-sided 2-D range search), and R-trees and other spatial structures. We also discuss several open problems along the way.

[1]  Jeffrey Scott Vitter,et al.  Efficient Memory Access in Large-Scale Computation , 1991, STACS.

[2]  Roberto Grossi,et al.  Efficient Splitting and Merging Algorithms for Order Decomposable Problems (Extended Abstract) , 1997, ICALP.

[3]  Bernard Chazelle,et al.  Lower bounds for orthogonal range searching: I. The reporting case , 1990, JACM.

[4]  Jeffrey Scott Vitter,et al.  On two-dimensional indexability and optimal range search indexing , 1999, PODS '99.

[5]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[6]  Garth A. Gibson,et al.  Report of the Working Group on Storage I/O for Large-Scale Computing , 1996 .

[7]  Hans-Peter Kriegel,et al.  The Buddy Effect: An efficient and robust access method for spatial data base systems , 1990 .

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[10]  George S. Lueker,et al.  Adding range restriction capability to dynamic data structures , 1985, JACM.

[11]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[12]  Bernard Chazelle,et al.  Linear space data structures for two types of range search , 1987, Discret. Comput. Geom..

[13]  Jeffrey Scott Vitter,et al.  Strategic directions in storage I/O issues in large-scale computing , 1996, CSUR.

[14]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[15]  Jeffrey Scott Vitter,et al.  Optimal dynamic interval management in external memory , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[16]  R. Pollack,et al.  Advances in Discrete and Computational Geometry , 1999 .

[17]  Diane Greene,et al.  An implementation and performance analysis of spatial data access methods , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[18]  Jeffrey Scott Vitter,et al.  I/O-Efficient Dynamic Point Location in Monotone Subdivisions , 1999 .

[19]  H. Edelsbrunner A new approach to rectangle intersections part I , 1983 .

[20]  Sridhar Ramaswamy,et al.  Path caching (extended abstract): a technique for optimal external searching , 1994, PODS '94.

[21]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[22]  Jeffrey Scott Vitter,et al.  Efficient 3-D range searching in external memory , 1996, STOC '96.

[23]  Bernard Chazelle Filtering Search: A New Approach to Query-Answering , 1983, FOCS.

[24]  Klaus H. Hinrichs The grid file system: implementation and case studies of applications , 1985 .

[25]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[26]  David Scot Taylor,et al.  Tight bounds for 2-dimensional indexing schemes , 1998, PODS '98.

[27]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[28]  D. E. Vengro,et al.  Eecient 3-d Range Searching in External Memory , 1995 .

[29]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[30]  David B. Lomet,et al.  Concurrency and recovery for index trees , 1997, The VLDB Journal.

[31]  Michael T. Goodrich,et al.  Topology B-Trees and Their Applications , 1995, WADS.

[32]  T. M. Murali,et al.  I/O-efficient algorithms for contour-line extraction and planar graph blocking , 1998, SODA '98.

[33]  Ambuj K. Singh,et al.  Optimal Dynamic Range Searching in Non-replicating Index Structures , 1999, ICDT.

[34]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[35]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures , 1999, External Memory Algorithms.

[36]  Haim Mendelson,et al.  Analysis of Extendible Hashing , 1982, IEEE Transactions on Software Engineering.

[37]  Sridhar Ramaswamy,et al.  The P-range tree: a new data structure for range searching in secondary memory , 1995, SODA '95.

[38]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[39]  Richard J. Enbody,et al.  Dynamic hashing schemes , 1988, CSUR.

[40]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[41]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[42]  Edward M. McCreight,et al.  Priority Search Trees , 1985, SIAM J. Comput..

[43]  Klaus Hinrichs,et al.  The grid file system , 1985 .

[44]  Ibrahim Kamel,et al.  Bulk insertion in dynamic r-trees , 1996 .

[45]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[46]  Roberto Grossi,et al.  Efficient cross-trees for external memory , 1999, External Memory Algorithms.

[47]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[48]  Sridhar Ramaswamy,et al.  Indexing for Data Models with Constraints and Classes , 1996, J. Comput. Syst. Sci..

[49]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[50]  Rakesh M. Verma,et al.  An Efficient Multiversion Access STructure , 1997, IEEE Trans. Knowl. Data Eng..

[51]  Peter Widmayer,et al.  Spatial Data Structures: Concepts and Design Choices , 1996, Algorithmic Foundations of Geographic Information Systems.

[52]  K. Hinrichs,et al.  E cient Bulk Operations on Dynamic R-trees , 1999 .

[53]  Jeffrey Scott Vitter,et al.  Efficient memory access in large-scale computation (invited paper) , 1991 .

[54]  P. Agarwal,et al.  Eecient Searching with Linear Constraints , 1997 .

[55]  VitterJeffrey Scott External memory algorithms and data structures , 2001 .

[56]  Cláudio T. Silva,et al.  External memory techniques for isosurface extraction in scientific visualization , 1998, External Memory Algorithms.

[57]  Lars Arge,et al.  The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract) , 1995, WADS.

[58]  Rakesh D. Barve,et al.  External Memory Algorithms with Dynamically Changing Memory Allocations . , 1998 .

[59]  Georgios Evangelidis,et al.  The hB $^\Pi$-tree: a multi-attribute index supporting concurrency, recovery and node consolidation , 1997, The VLDB Journal.

[60]  Jeffrey Scott Vitter,et al.  Efficient searching with linear constraints , 1998, J. Comput. Syst. Sci..

[61]  Daniel P. Miranker,et al.  A lower bound theorem for indexing schemes and its application to multidimensional range queries , 1998, PODS '98.

[62]  Gerth Stølting Brodal,et al.  Worst-Case External-Memory Priority Queues , 1998, SWAT.

[63]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[64]  Jeffrey Scott Vitter,et al.  Modeling and optimizing I/O throughput of multiple disks on a bus , 1999, SIGMETRICS '99.

[65]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[66]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[67]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[68]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[69]  Sridhar Ramaswamy,et al.  Path Caching: A Technique for Optimal External Searching , 1994, PODS 1994.

[70]  Arif Merchant,et al.  An analytic behavior model for disk drives with readahead caches and request reordering , 1998, SIGMETRICS '98/PERFORMANCE '98.

[71]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[72]  Phillip B. Gibbons,et al.  Modeling and optimizing I/O throughput of multiple disks on a bus (summary) , 1998, SIGMETRICS '98/PERFORMANCE '98.

[73]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[74]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.