Processing complex similarity queries in peer-to-peer networks

Similarity search for content-based retrieval (where content can be any combination of text, image, audio/video, etc.) has gained importance in recent years, also because of the advantage of ranking the retrieved results according to their proximity to a query. However, to use similarity search in real world applications, we need to tackle the problem of huge volumes of such mixed multimedia data (e.g., coming from Web sites) and the problem of their distribution on multiple cooperating nodes. The proposed approach is being used in two running projects: SAPIR and NeP4B. In this paper we approach this problem by considering a scenario of a network of autonomous peers maintaining a local collection of metric objects (i.e., mixed mode multimedia content). This network forms a distributed Peer-to-Peer (P2P) search engine for similarity search based on the paradigm of Routing Index. Each peer in the network thus maintains both an index of its local resources and a table for every neighbor, summarizing the objects that are reachable from it. The paper presents techniques that aim to make our P2P similarity-based search system viable, trading approximate results for scalable solutions. Results of simulations that use real collections of images are discussed.