Similarity Search and Indexing for High-Dimensional Data

Searching by similarity is a critical operation on many systems, and thus has attracted the attention of many disciplines in Computer Sciences, including Computational Geometry, Machine Learning, Multimedia Retrieval and, of course, Databases. To perform efficiently, similarity search requires the support of indexing, which suffers from the infamous "curse of the dimensionality". In this tutorial we will introduce the challenges of indexing and searching high-dimensional data, and present the most recent tools available to "tame the curse". The tutorial takes 3 hours and is divided in three main sections: an introduction, where we define the problem, its applications and its challenges; a presentation of the state of the art, where we present the most important/interesting solutions to the problem; and a study of case of the application of high-dimensional indexing to problems in Content Based Information Retrieval.