Density-Based Data Analysis and Similarity Search

Similarity search in database systems is becoming an increas- ingly important task in modern application domains such as multimedia, molecular biology, medical imaging, computer aided engineering, market- ing and purchasing assistance as well as many others. Furthermore, the feature transformations and distance measures used in similarity search build the foundation of sophisticated data analysis and mining tech- niques. In this chapter, we show how visualizing cluster hierarchies de- scribing a database of objects can aid the user in the time consuming task to find similar objects and discover interesting patterns. We present related work and explain its shortcomings which led to the development of our new methods. Based on reachability plots, we introduce methods for visually exploring a data set in multiple representations and compar- ing multiple similarity models. Furthermore, we present a new method for automatically extracting cluster hierarchies from a given reachability plot which allows a user to browse the database for similarity search. We integrated our new method in a prototype which serves two purposes, namely visual data analysis and a new way of object retrieval called navigational similarity search.

[1]  Hans-Peter Kriegel,et al.  Effective similarity search on voxelized CAD objects , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[2]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[3]  Zhiyong Lu,et al.  Automatic Extraction of Clusters from Hierarchical Clustering Representations , 2003, PAKDD.

[4]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[5]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  Hans-Peter Kriegel,et al.  Using sets of feature vectors for similarity search on voxelized CAD objects , 2003, SIGMOD '03.

[8]  Hans-Peter Kriegel,et al.  Using extended feature objects for partial similarity retrieval , 1997, The VLDB Journal.

[9]  Hans-Peter Kriegel,et al.  S3: similarity search in CAD database systems , 1997, SIGMOD '97.

[10]  Daniel A. Keim,et al.  Efficient geometry-based similarity search of 3D spatial databases , 1999, SIGMOD '99.

[11]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[12]  Elke Achtert,et al.  Online hierarchical clustering in a data warehouse environment , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[15]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[16]  Hans-Peter Kriegel,et al.  Data bubbles: quality preserving performance boosting for hierarchical clustering , 2001, SIGMOD '01.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.