3D Shape Histograms for Similarity Search and Classification in Spatial Databases

Classification is one of the basic tasks of data mining in modern database applications including molecular biology, astronomy, mechanical engineering, medical imaging or meteorology. The underlying models have to consider spatial properties such as shape or extension as well as thematic attributes. We introduce 3D shape histograms as an intuitive and powerful similarity model for 3D objects. Particular flexibility is provided by using quadratic form distance functions in order to account for errors of measurement, sampling, and numerical rounding that all may result in small displacements and rotations of shapes. For query processing, a general filter-refinement architecture is employed that efficiently supports similarity search based on quadratic forms. An experimental evaluation in the context of molecular biology demonstrates both, the high classification accuracy of more than 90% and the good performance of the approach.

[1]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[2]  Nick Roussopoulos,et al.  Faloutsos: "the r+- tree: a dynamic index for multidimensional objects , 1987 .

[3]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[4]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Thomas Seidl,et al.  Adaptable Similarity Search in 3-D Spatial Database Systems (Abstract) , 1998, Datenbank Rundbr..

[6]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[7]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[8]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[9]  Hans-Peter Kriegel,et al.  Approximation-Based Similarity Search for 3-D Surface Segments , 1998, GeoInformatica.

[10]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[11]  Hans-Peter Kriegel,et al.  Using extended feature objects for partial similarity retrieval , 1997, The VLDB Journal.

[12]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[13]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[14]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[15]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[16]  Hans-Peter Kriegel,et al.  Improving Adaptable Similarity Query Processing by Using Approximations , 1998, VLDB.

[17]  Stefan Berchtold,et al.  Section Coding: A Method for Similarity Search in CAD Databases , 1997 .

[18]  Hans-Peter Kriegel,et al.  S3: similarity search in CAD database systems , 1997, SIGMOD '97.

[19]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[20]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[21]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[22]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[23]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[24]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[25]  Hanan Samet,et al.  Hierarchical Spatial Data Structures , 1989, SSD.

[26]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[27]  Yehezkel Lamdan,et al.  Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[28]  Hans-Peter Kriegel,et al.  A 3D Molecular Surface Representation Supporting Neighborhood Queries , 1995, SSD.

[29]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[30]  David B. Cooper,et al.  Recognition and positioning of rigid objects using algebraic moment invariants , 1991, Optics & Photonics.

[31]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[32]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[33]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[34]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[35]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[36]  Hans-Peter Kriegel,et al.  3D Similarity Search by Shape Approximation , 1997, SSD.

[37]  James E. Gary,et al.  Similar shape retrieval using a structural feature index , 1993, Inf. Syst..

[38]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[39]  Hans-Peter Kriegel,et al.  A Multistep Approach for Shape Similarity Search in Image Databases , 1998, IEEE Trans. Knowl. Data Eng..