Effectively learning spatial indices

Machine learning, especially deep learning, is used increasingly to enable better solutions for data management tasks previously solved by other means, including database indexing. A recent study shows that a neural network can not only learn to predict the disk address of the data value associated with a one-dimensional search key but also outperform B-tree-based indexing, thus promises to speed up a broad range of database queries that rely on B-trees for efficient data access. We consider the problem of learning an index for two-dimensional spatial data. A direct application of a neural network is unattractive because there is no obvious ordering of spatial point data. Instead, we introduce a rank space based ordering technique to establish an ordering of point data and group the points into blocks for index learning. To enable scalability, we propose a recursive strategy that partitions a large point set and learns indices for each partition. Experiments on real and synthetic data sets with more than 100 million points show that our learned indices are highly effective and efficient. Query processing using our indices is more than an order of magnitude faster than the use of R-trees or a recently proposed learned index.

[1]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[2]  Carsten Binnig,et al.  FITing-Tree: A Data-aware Index Structure , 2018, SIGMOD Conference.

[3]  Yufei Tao,et al.  Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability , 2018, Proc. VLDB Endow..

[4]  Thomas Heinis,et al.  Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes , 2019, EDBT.

[5]  Jianliang Xu,et al.  Learned Index for Spatial Queries , 2019, 2019 20th IEEE International Conference on Mobile Data Management (MDM).

[6]  Jin Huang,et al.  Towards a Painless Index for Spatial Objects , 2014, TODS.

[7]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[8]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[9]  Paolo Ferragina,et al.  The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds , 2020, Proc. VLDB Endow..

[10]  Bernhard Seeger,et al.  A revised r*-tree in comparison with related index structures , 2009, SIGMOD Conference.

[11]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[12]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[13]  Badrish Chandramouli,et al.  Qd-tree: Learning Data Layouts for Big Data Analytics , 2020, SIGMOD Conference.

[14]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[15]  Beng Chin Ooi,et al.  Query and Update Efficient B+-Tree Based Indexing of Moving Objects , 2004, VLDB.

[16]  Badrish Chandramouli,et al.  ALEX: An Updatable Adaptive Learned Index , 2019, SIGMOD Conference.

[17]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[18]  Benjamin I. P. Rubinstein,et al.  Function Interpolation for Learned Index Structures , 2020, ADC.

[19]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[20]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[21]  Volker Markl,et al.  Mistral - Processing Relational Queries using a Multidimensional Access Technique , 1999, Datenbank Rundbr..

[22]  Christian S. Jensen,et al.  PolyFit: Polynomial-based Indexing Approach for Fast Approximate Range Aggregate Queries , 2020, ArXiv.

[23]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[24]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[25]  Tim Kraska,et al.  Learning Multi-Dimensional Indexes , 2020, SIGMOD Conference.

[26]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[27]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[28]  Hans-Werner Six,et al.  The LSD tree: Spatial Access to Multidimensional Point and Nonpoint Objects , 1989, VLDB.

[29]  Evica Milchevski,et al.  The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries , 2020, EDBT.

[30]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[31]  Yufei Tao,et al.  Packing R-trees with Space-filling Curves , 2020, ACM Trans. Database Syst..

[32]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[33]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[34]  Tim Kraska,et al.  SageDB: A Learned Database System , 2019, CIDR.

[35]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[36]  Susanne E. Hambrusch,et al.  Main Memory Evaluation of Monitoring Queries Over Moving Objects , 2004, Distributed and Parallel Databases.

[37]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[38]  Tim Kraska,et al.  RadixSpline: a single-pass learned index , 2020, aiDM@SIGMOD.

[39]  Long Yang,et al.  LISA: A Learned Index Structure for Spatial Data , 2020, SIGMOD Conference.

[40]  Thomas Heinis,et al.  Considerations for handling updates in learned index structures , 2019, aiDM@SIGMOD.

[41]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[42]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[43]  Timos K. Sellis,et al.  Point Representation of Spatial Objects and Query Window Extension: A New Technique for Spatial Access Methods , 1997, Int. J. Geogr. Inf. Sci..

[44]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.