Large data methods for multimedia

This tutorial describes techniques essential for searching the large multimedia databases that are now common on the Internet. There are up to 10 million songs in commercial music catalogues and over 300 million images stored in online photo services such as Flickr. How can we find the music, videos or images we want? How can we organize such large collections: find duplicates, create links between similar documents, extract and annotate semantic structures from complex audiovisual documents? Conventional methods for handling large data sets, such as hashing, get us part of the way, but those methods may not straightforwardly be used for similarity-based matching and retrieval in audiovisual document collections. On the other hand, several elaborate methods from multimedia retrieval are available for semantic document analysis. Unfortunately, those methods generally do not scale for large data sets. Instead, new classes of algorithms combining the best of the two worlds of large data methods and semantic analysis are needed to handle large multimedia databases. Innovative methods such as locality sensitive hashing, which are based on randomized probes, are the new workhorses. This tutorial covers methods for multimedia retrieval on large document collections. Starting with audio retrieval, we describe both the theory (i.e., randomized algorithms for hashing) and the implementation details (how do you store hash values for millions of songs?). A special focus is on how to combine large data methods with semantically meaningful descriptors in order to facilitate efficient similarity-based retrieval. Besides audio, the tutorial also covers image, 3d motion and video retrieval.

[1]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[2]  Michael Clausen,et al.  PROMS: A Web-based Tool for Searching in Polyphonic Music , 2000, ISMIR.

[3]  Frank Kurth,et al.  Content-Based Information Retrieval by Group Theoretical Methods , 2004 .

[4]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[5]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[6]  Meinard Müller,et al.  Syncplayer - An Advanced System for Multimodal Music Access , 2005, ISMIR.

[7]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[8]  J. Herre,et al.  Robust identification/fingerprinting of audio signals using spectral flatness features , 2002 .

[9]  Frank Kurth,et al.  A unified approach to content-based and fault-tolerant music recognition , 2004, IEEE Transactions on Multimedia.

[10]  Michael A. Casey,et al.  Song Intersection by Approximate Nearest Neighbor Search , 2006, ISMIR.

[11]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[12]  Michael A. Casey,et al.  The Importance of Sequences in Musical Similarity , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Michael A. Casey,et al.  Fast Recognition of Remixed Music Audio , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Michael Clausen,et al.  An Efficient Indexing and Search Technique for Multimedia Databases , 2003 .

[15]  Tido Röder,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH 2005.

[16]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[17]  Jürgen Herre,et al.  AudioID: Towards Content-Based Identification of Audio Material , 2001 .