Cottontail DB: An Open Source Database System for Multimedia Retrieval and Analysis

Multimedia retrieval and analysis are two important areas in "Big data" research. They have in common that they work with feature vectors as proxies for the media objects themselves. Together with metadata such as textual descriptions or numbers, these vectors describe a media object in its entirety, and must therefore be considered jointly for both storage and retrieval. In this paper we introduce Cottontail DB, an open source database management system that integrates support for scalar and vector attributes in a unified data and query model that allows for both Boolean retrieval and nearest neighbour search. We demonstrate that Cottontail DB scales well to large collection sizes and vector dimensions and provide insights into how it proved to be a valuable tool in various use cases ranging from the analysis of MRI data to realizing retrieval solutions in the cultural heritage domain.

[1]  Heiko Schuldt,et al.  Polypheny-DB: Towards a Distributed and Self-Adaptive Polystore , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[2]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[3]  Zi Huang,et al.  Dissimilarity measures for content-based image retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[4]  Heiko Schuldt,et al.  Multimodal Multimedia Retrieval with vitrivr , 2019, ICMR.

[5]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[6]  Hans-Jörg Schek,et al.  Methods for the administration of textual data in database systems , 1980, SIGIR '80.

[7]  Heiko Schuldt,et al.  Cineast: A Multi-feature Sketch-Based Video Retrieval Engine , 2014, 2014 IEEE International Symposium on Multimedia.

[8]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[9]  Heiko Schuldt,et al.  Combining Boolean and Multimedia Retrieval in vitrivr for Large-Scale Video Search , 2019, MMM.

[10]  Heiko Schuldt,et al.  Retrieval of Structured and Unstructured Data with vitrivr , 2019, LSC@ICMR.

[11]  H. M. Blanken,et al.  Database technology and the management of multimedia data in the Mirror project , 1998, Other Conferences.

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Minh-Triet Tran,et al.  [Invited papers] Comparing Approaches to Interactive Lifelog Search at the Lifelog Search Challenge (LSC2018) , 2019, ITE Transactions on Media Technology and Applications.

[14]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[15]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[16]  George Awad,et al.  V3C1 Dataset: An Evaluation of Content Characteristics , 2019, ICMR.

[17]  Heiko Schuldt,et al.  vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections , 2016, ACM Multimedia.

[18]  Luca Rossetto,et al.  Interactive Video Retrieval in the Age of Deep Learning – Detailed Evaluation of VBS 2019 , 2020, IEEE Transactions on Multimedia.

[19]  Mathias Lux,et al.  Content based image retrieval with LIRe , 2011, ACM Multimedia.

[20]  Heiko Schuldt,et al.  ADAMpro: Database Support for Big Multimedia Retrieval , 2016, Datenbank-Spektrum.

[21]  J. Duerk,et al.  Magnetic Resonance Fingerprinting , 2013, Nature.