External memory data structures

In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.

[1]  Edward M. Reingold,et al.  Binary Search Trees of Bounded Balance , 1973, SIAM J. Comput..

[2]  Desh Ranjan,et al.  Space-Filling Curves and Their Use in the Design of Geometric Data Structures , 1997, Theor. Comput. Sci..

[3]  Lars Arge,et al.  The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract) , 1995, WADS.

[4]  Mark H. Overmars,et al.  On the Equivalence of Some Rectangle Problems , 1982, Inf. Process. Lett..

[5]  Norbert Zeh,et al.  I/O-Efficient Algorithms for Graphs of Bounded Treewidth , 2001, SODA '01.

[6]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[7]  Kurt Mehlhorn,et al.  Randomized external-memory algorithms for some geometric problems , 1998, SCG '98.

[8]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[9]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[10]  Timothy M. Chan Random Sampling, Halfspace Range Reporting, and Construction of (<= k)-Levels in Three Dimensions , 2000, SIAM J. Comput..

[11]  Lars Arge,et al.  The I/O-Complexity of Ordered Binary-Decision Diagram Manipulation , 1996 .

[12]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[13]  Christian S. Jensen,et al.  Indexing the Positions of Continuously Moving Objects , 2000, SIGMOD Conference.

[14]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[15]  Mark H. Overmars,et al.  Range searching in a set of line segments , 1985, SCG '85.

[16]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[17]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[18]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[19]  Kamesh Munagala,et al.  I/O-complexity of graph algorithms , 1999, SODA '99.

[20]  Rolf Klein,et al.  Priority Search Trees in Secondary Memory (Extended Abstract) , 1987, WG.

[21]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[22]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[23]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[24]  Jack A. Orenstein A comparison of spatial query processing techniques for native and parameter spaces , 1990, SIGMOD '90.

[25]  Yi-Jen Chiang,et al.  I/O optimal isosurface extraction , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[26]  Bernard Chazelle,et al.  The power of geometric duality , 1985, BIT Comput. Sci. Sect..

[27]  Özgür Ulusoy,et al.  A Quadtree-Based Dynamic Attribute Indexing Method , 1998, Comput. J..

[28]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[29]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[30]  Kurt Mehlhorn,et al.  LEDA-SM Extending LEDA to Secondary Memory , 1999, Algorithm Engineering.

[31]  T. M. Murali,et al.  I/O-efficient algorithms for contour-line extraction and planar graph blocking , 1998, SODA '98.

[32]  Kurt Mehlhorn,et al.  Dynamic point location in general subdivisions , 1992, SODA '92.

[33]  Ulrich Meyer,et al.  An experimental study of priority queues in external memory , 1999, JEAL.

[34]  Lars Arge,et al.  The Buuer Tree: a New Technique for Optimal I/o-algorithms ? , 1995 .

[35]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[36]  Leonidas J. Guibas,et al.  Data Structures for Mobile Data , 1997, J. Algorithms.

[37]  Jeffrey Scott Vitter,et al.  Optimal dynamic interval management in external memory , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[38]  Michael T. Goodrich,et al.  Topology B-Trees and Their Applications , 1995, WADS.

[39]  Jeffrey Scott Vitter,et al.  Online Data Structures in External Memory , 1999, WADS.

[40]  Edward M. McCreight,et al.  Priority Search Trees , 1985, SIAM J. Comput..

[41]  Georgios Evangelidis,et al.  The hB $^\Pi$-tree: a multi-attribute index supporting concurrency, recovery and node consolidation , 1997, The VLDB Journal.

[42]  Roberto Grossi,et al.  On sorting strings in external memory (extended abstract) , 1997, STOC '97.

[43]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[44]  Klaus Hinrichs,et al.  The grid file system , 1985 .

[45]  Yi-Jen Chiang,et al.  Dynamic and i/o-efficient algorithms for computational geometry and graph problems: theoretical and experimental results , 1995 .

[46]  Peter Sanders Fast Priority Queues for Cached Memory , 1999, ALENEX.

[47]  Maurizio Talamo,et al.  Orders, k-sets and Fast Halfplane Search on Paged Memory , 1994, ORDAL.

[48]  Dieter Pfoser,et al.  Novel Approaches in Query Processing for Moving Object Trajectories , 2000, VLDB 2000.

[49]  Erik D Vengroff,et al.  I/O Efficient Scientific Computation Using TPIE , 1995 .

[50]  Jukka Teuhola,et al.  Heaps and Heapsort on Secondary Storage , 1999, Theor. Comput. Sci..

[51]  Bernard Chazelle,et al.  Filtering search: A new approach to query-answering , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[52]  Suresh Venkatasubramanian,et al.  On external memory graph traversal , 2000, SODA '00.

[53]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 1: Sorting and Searching , 2011, EATCS Monographs on Theoretical Computer Science.

[54]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[55]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[56]  David M. Mark,et al.  A Comparative Analysis of some 2-Dimensional Orderings , 1990, Int. J. Geogr. Inf. Sci..

[57]  Jirí Matousek,et al.  Efficient partition trees , 1991, SCG '91.

[58]  Pankaj K. Agarwal,et al.  Time Responsive External Data Structures for Moving Points , 2001, WADS.

[59]  Lars Arge,et al.  On external-memory MST, SSSP and multi-way planar graph separation , 2000, J. Algorithms.

[60]  Norbert Zeh,et al.  External Memory Algorithms for Outerplanar Graphs , 1999, ISAAC.

[61]  Ulrich Meyer External memory BFS on undirected graphs with bounded degree , 2001, SODA '01.

[62]  Greg N. Frederickson,et al.  A data structure for dynamically maintaining rooted trees , 1997, SODA '93.

[63]  Lalit M. Patnaik,et al.  Genetic algorithms: a survey , 1994, Computer.

[64]  Ulrich Meyer,et al.  On External-Memory Planar Depth First Search , 2001, J. Graph Algorithms Appl..

[65]  Jeffrey Scott Vitter,et al.  Efficient searching with linear constraints , 1998, J. Comput. Syst. Sci..

[66]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, ESA.

[67]  Daniel P. Miranker,et al.  A lower bound theorem for indexing schemes and its application to multidimensional range queries , 1998, PODS '98.

[68]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[69]  Gerth Stølting Brodal,et al.  Worst-Case External-Memory Priority Queues , 1998, SWAT.

[70]  S. Rao Kosaraju,et al.  Algorithms for dynamic closest pair and n-body potential fields , 1995, SODA '95.

[71]  Jeffrey Scott Vitter,et al.  External-Memory Algorithms for Processing Line Segments in Geographic Information Systems (Extended Abstract) , 1995, ESA.

[72]  Ambuj K. Singh,et al.  Optimal Dynamic Range Searching in Non-replicating Index Structures , 1999, ICDT.

[73]  Kurt Mehlhorn,et al.  Sorting and Searching (Eatcs Monographs on Theoretical Computer Science) , 1984 .

[74]  Jeffrey Scott Vitter,et al.  I/O-efficient dynamic point location in monotone planar subdivisions , 1999, SODA '99.

[75]  Jeffrey Scott Vitter,et al.  I/O-Efficient Algorithms for Problems on Grid-Based Terrains , 2001, JEAL.

[76]  N. Zeh I/O-Efficient Planar Separators and Applications , 2001 .

[77]  Mark H. Overmars,et al.  Batched Dynamic Solutions to Decomposable Searching Problems , 1985, J. Algorithms.

[78]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[79]  Vijay Kumar,et al.  Improved algorithms and data structures for solving graph problems in external memory , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[80]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures , 1999, External Memory Algorithms.

[81]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[82]  Sridhar Ramaswamy,et al.  A Unified Approach for Indexed and Non-Indexed Spatial Joins , 2000, EDBT.

[83]  William J. Schroeder,et al.  Interactive out-of-core isosurface extraction , 1998 .

[84]  Sridhar Ramaswamy,et al.  The P-range tree: a new data structure for range searching in secondary memory , 1995, SODA '95.

[85]  Bo Xu,et al.  Moving objects databases: issues and solutions , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[86]  Jack Snoeyink Point Location , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[87]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[88]  David Scot Taylor,et al.  Tight bounds for 2-dimensional indexing schemes , 1998, PODS '98.

[89]  Jan Vahrenhold,et al.  I/O-efficient dynamic planar point location (extended abstract) , 2000, SCG '00.

[90]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[91]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[92]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[93]  S. Muthukrishnan,et al.  Overcoming the memory bottleneck in suffix tree construction , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[94]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[95]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[96]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[97]  Oliver Günther,et al.  The design of the cell tree: an object-oriented index structure for geometric databases , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[98]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[99]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[100]  Sridhar Ramaswamy,et al.  Theory and practice of I/O-efficient algorithms for multidimensional batched searching problems , 1998, SODA '98.

[101]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[102]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[103]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[104]  Sridhar Ramaswamy Efficient Indexing for Constraint and Temporal Databases , 1997, ICDT.

[105]  Tamás Lukovszki,et al.  I/O-Efficient Well-Separated Pair Decomposition and Its Applications , 2000, ESA.

[106]  Dimitrios Gunopulos,et al.  On indexing mobile objects , 1999, PODS '99.

[107]  Diane Greene,et al.  An implementation and performance analysis of spatial data access methods , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[108]  Jon Louis Bentley,et al.  Decomposable Searching Problems , 1979, Inf. Process. Lett..

[109]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[110]  A. Prasad Sistla,et al.  Updating and Querying Databases that Track Mobile Units , 1999, Distributed and Parallel Databases.

[111]  Pankaj K. Agarwal,et al.  Indexing moving points (extended abstract) , 2000, PODS '00.

[112]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[113]  Joachim Gudmundsson,et al.  On R-trees with Low Stabbing Number , 2000, ESA.

[114]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[115]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[116]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[117]  Roberto Grossi,et al.  A fully-dynamic data structure for external substring search , 1995, STOC '95.

[118]  Jan Vahrenhold,et al.  Planar Point Location for Large Data Sets: To Seek or Not to Seek , 2000, Algorithm Engineering.

[119]  Bernard Chazelle,et al.  Lower bounds for orthogonal range searching: I. The reporting case , 1990, JACM.

[120]  Jeffrey Scott Vitter,et al.  On two-dimensional indexability and optimal range search indexing , 1999, PODS '99.

[121]  Roberto Grossi,et al.  Fast string searching in secondary storage: theoretical developments and experimental results , 1996, SODA '96.

[122]  Mark H. Overmars,et al.  The Design of Dynamic Data Structures , 1987, Lecture Notes in Computer Science.

[123]  Ralf Hartmut Güting,et al.  XP-Trees: External Priority Search Trees , 1990 .

[124]  Fabrizio Luccio,et al.  Dynamic Dictionary Matching in External Memory , 1998, Inf. Comput..

[125]  Sridhar Ramaswamy,et al.  Indexing for Data Models with Constraints and Classes , 1996, J. Comput. Syst. Sci..

[126]  Jyh-Jong Tsay,et al.  External-memory computational geometry , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[127]  Cláudio T. Silva,et al.  External memory techniques for isosurface extraction in scientific visualization , 1998, External Memory Algorithms.

[128]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[129]  Jeffrey Scott Vitter,et al.  External-Memory Algorithms for Processing Line Segments in Geographic Information Systems , 1996 .

[130]  Jeffrey Scott Vitter,et al.  Efficient 3-D range searching in external memory , 1996, STOC '96.

[131]  Sergei Bespamyatnikh,et al.  An Optimal Algorithm for Closest-Pair Maintenance , 1998, Discret. Comput. Geom..

[132]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[133]  Roberto Grossi,et al.  Efficient Splitting and Merging Algorithms for Order Decomposable Problems , 1999, Inf. Comput..

[134]  Yi-Jen Chiang,et al.  Experiments on the Practical I/O Efficiency of Geometric Algorithms: Distribution Sweep vs. Plane Sweep , 1995, WADS.

[135]  Paolo Ferragina,et al.  On Constructing Suffix Arrays in External Memory , 1999, ESA.

[136]  Alberto Marchetti-Spaccamela,et al.  Memory Paging for Connectivity and Path Problems in Graphs , 1993, J. Graph Algorithms Appl..

[137]  Sridhar Ramaswamy,et al.  Scalable Sweeping-Based Spatial Join , 1998, VLDB.

[138]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[139]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[140]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[141]  Rakesh M. Verma,et al.  An Efficient Multiversion Access STructure , 1997, IEEE Trans. Knowl. Data Eng..

[142]  Peter Widmayer,et al.  Spatial Data Structures: Concepts and Design Choices , 1996, Algorithmic Foundations of Geographic Information Systems.

[143]  Dimitrios Gunopulos,et al.  Nearest Neighbor Queries in a Mobile Environment , 1999, Spatio-Temporal Database Management.

[144]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[145]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[146]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[147]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[148]  Kurt Mehlhorn,et al.  A new data structure for representing sorted lists , 1982, Acta Informatica.

[149]  Jeffrey Scott Vitter,et al.  Flow computation on massive grids , 2001, GIS '01.

[150]  Cláudio T. Silva,et al.  I/O optimal isosurface extraction (extended abstract) , 1997, VIS '97.

[151]  Dimitrios Gunopulos,et al.  Efficient computation of temporal aggregates with range predicates , 2001, PODS '01.

[152]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[153]  Norbert Zeh,et al.  An External Memory Data Structure for Shortest Path Queries , 1999, COCOON.

[154]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[155]  Robert E. Tarjan,et al.  Planar point location using persistent search trees , 1986, CACM.

[156]  Pankaj K. Agarwal,et al.  Indexing Moving Points , 2003, J. Comput. Syst. Sci..

[157]  Norbert Zeh,et al.  An External Memory Data Structure for Shortest Path Queries , 1999, COCOON.

[158]  Joachim Gudmundsson,et al.  Box-Trees and R-Trees with Near-Optimal Query Time , 2001, SCG '01.

[159]  Sridhar Ramaswamy,et al.  Path Caching: A Technique for Optimal External Searching , 1994, PODS 1994.

[160]  Dieter Pfoser,et al.  Novel Approaches to the Indexing of Moving Object Trajectories , 2000, VLDB.

[161]  Jeffrey Scott Vitter,et al.  A Framework for Index Bulk Loading and Dynamization , 2001, ICALP.

[162]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[163]  H. Edelsbrunner A new approach to rectangle intersections part I , 1983 .

[164]  Sridhar Ramaswamy,et al.  Path caching (extended abstract): a technique for optimal external searching , 1994, PODS '94.