Hashing by proximity to process duplicates in spatial databases

In a spatial database, an object may extend arbitrarily in space. As a result, many spatial data structures (e.g., the quadtree, the cell tree, the R+-tree) represent an object by partitioning it into multiple, yet simple, pieces, each of which is stored separately inside the data structure. Many operations on these data structures are likely to produce duplicate results because of the multiplicity of object pieces. A novel approach for duplicate processing based on proximity of spatial objects is presented. This is different from conventional duplicate elimination in database systems because, with spatial databases, different pieces of the same object can span multiple buckets of the underlying data structure. Example algorithms are presented to perform duplicate processing using proximity for quadtree representation of line segments and arbitrary rectangles. The complexity of the algorithms is seen to depend on a geometric classification of different instances of the spatial objects. By using proximity and the spatial properties of the objects, the number of disk-I/O requests as well as the run-time storage during duplicate processing can be reduced.

[1]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[2]  Hanan Samet,et al.  Connected Component Labeling Using Quadtrees , 1981, JACM.

[3]  Oliver Günther,et al.  The design of the cell tree: an object-oriented index structure for geometric databases , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[4]  Allen Klinger,et al.  PATTERNS AND SEARCH STATISTICS , 1971 .

[5]  Walid G. Aref,et al.  Efficient processing of window queries in the pyramid data structure , 1990, PODS '90.

[6]  Michael J. Folk File Structures , 1987 .

[7]  Walid G. Aref,et al.  Uniquely reporting spatial objects: yet another operation for comparing spatial data structures , 1992 .

[8]  Hanan Samet,et al.  A general approach to connected-component labeling for arbitrary image representations , 1992, JACM.

[9]  SametHanan,et al.  A general approach to connected-component labeling for arbitrary image representations , 1992 .

[10]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[11]  Hanan Samet,et al.  A consistent hierarchical representation for vector data , 1986, SIGGRAPH.

[12]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[13]  Azriel Rosenfeld,et al.  Digital Picture Processing , 1976 .

[14]  Christos Faloutsos,et al.  Analysis of object oriented spatial access methods , 1987, SIGMOD '87.

[15]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.