Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files

In digital libraries, nearest-neighbor search (NN-search) plays a key role for content-based retrieval over multimedia objects. However, performance of existing NN-search techniques is not satisfactory with large collections and with high-dimensional representations of the objects. To obtain response times that are interactive, we pursue the following approach: it uses a linear algorithm that works with approximations of the vectors and parallelizes it. In more detail, we parallelize NN-search based on the VA-File in a Network of Workstations (NOW). This approach reduces search time to a reasonable level for large collections. The best speedup we have observed is by almost 30 for a NOW with only three components with 900 MB of feature data. But this requires a number of design decisions, in particular when taking load dynamism and heterogeneity of components into account. Our contribution is to address these design issues.

[1]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[2]  Pavel Zezula,et al.  Declustering of key-based partitioned signature files , 1996, TODS.

[3]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[4]  Christos Faloutsos,et al.  Bit-Sliced Signature Files for Very Large Text Databases an a Parallel Machine Architecture , 1994, EDBT.

[5]  Klemens Böhm,et al.  Metadata Management with the HERMES Coordination Middleware , 1998 .

[6]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[7]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[8]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[9]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[10]  Hans-Jörg Schek,et al.  Architecture of a networked image search and retrieval system , 1999, CIKM '99.

[11]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[12]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[13]  Vldb Endowment,et al.  The VLDB journal : the international journal on very large data bases. , 1992 .

[14]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[15]  Shashi Shekhar,et al.  Disk allocation methods for parallelizing grid files , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[16]  Hans-Jörg Schek,et al.  Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[17]  Klemens Böhm,et al.  Trading Quality for Time with Nearest Neighbor Search , 2000, EDBT.

[18]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[19]  Markus A. Stricker,et al.  Color indexing with weak spatial constraints , 1996, Electronic Imaging.

[20]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[21]  Myron Flickner,et al.  Query by Image and Video Content , 1995 .

[22]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[23]  Tzi-cker Chiueh,et al.  Content-Based Image Indexing , 1994, VLDB.

[24]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Yannis Manolopoulos,et al.  Similarity query processing using disk arrays , 1998, SIGMOD '98.