Multifeature audio segmentation for browsing and annotation

Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the Web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat different, more general approach for fast indexing of arbitrary audio data is the use of segmentation based on multiple temporal features combined with automatic or semi-automatic annotation. In this paper, a general methodology for audio segmentation is proposed. A number of experiments were performed to evaluate the proposed methodology and compare different segmentation schemes. Finally, a prototype audio browsing and annotation tool based on segmentation combined with existing classification techniques was implemented.