Merging Results for Distributed Content Based Image Retrieval

AbstractSearching information through the Internet often requires users to separately contact several digital libraries, use each library interface to author the query, analyze retrieval results and merge them with results returned by other libraries. Such a solution could be simplified by using a centralized server that acts as a gateway between the user and several distributed repositories: The centralized server receives the user query, forwards the user query to federated repositories—possibly translating the query in the specific format required by each repository—and fuses retrieved documents for presentation to the user. To accomplish these tasks efficiently, the centralized server should perform some major operations such as: resource selection, query transformation and data fusion. In this paper we report on some aspects of MIND, a system for managing distributed, heterogeneous multimedia libraries (MIND, 2001, http://www.mind-project.org). In particular, this paper focusses on the issue of fusing results returned by different image repositories. The proposed approach is based on normalization of matching scores assigned to retrieved images by individual libraries. Experimental results on a prototype system show the potential of the proposed approach with respect to traditional solutions.

[1]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[2]  V. S. Subrahmanian,et al.  A multi-similarity algebra , 1998, SIGMOD '98.

[3]  Norbert Fuhr Optimum Database Selection in Networked IR , 1996, Networked Information Retrieval.

[4]  Luo Si,et al.  Using sampled data and regression to merge search engine results , 2002, SIGIR '02.

[5]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[6]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[7]  Aidong Zhang,et al.  Efficient resource selection in distributed visual information systems , 1997, MULTIMEDIA '97.

[8]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[9]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[10]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[11]  Norbert Fuhr Networked information retrieval , 1996, SIGIR '96.

[12]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[13]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[14]  Kui-Lam Kwok,et al.  TREC-3 Ad-Hoc, Routing Retrieval and Thresholding Experiments using PIRCS , 1994, TREC.

[15]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[16]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[17]  Laura M. Haas,et al.  Querying Multimedia Data from Multiple Repositories by Content: the Garlic Project , 1995, VDB.

[18]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[19]  Aidong Zhang,et al.  Metadatabase and search agent for multimedia database access over Internet , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[20]  Alberto Del Bimbo,et al.  Using indexing structures for resource descriptors extraction from distributed image repositories , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.