Content based audio retrieval with MFCC feature extraction, clustering and sort-merge techniques

Content Based Audio Retrieval (CBAR) has been a growing field of research for the past decade. To be capable of classifying and accessing the audio files, relevant to user's concern, is fundamental for structuring multimedia web search engines. This paper proposes a technique to build a system to retrieve audio files by acoustic similarity using Sort-Merge technique. The frequency features of the audio streams are extracted. We consider Mel Frequency Cepstral Coefficients (MFCC) for dimensionality reduction. The mean of the coefficients for the key song and for all the songs in the database is taken. Then, difference measure is calculated using Euclidian Distance. This retrieval is tested on a corpus of songs sung by both professional and non-professional singers. When a query audio is given, the system first finds the clusters with identical high energy components, merges them and then the audio files in all the merged clusters are sorted according to their distances. With this approach, we can classify and retrieve standard audios more precisely, using fewer features and less computation time.