Content-based retrieval of MP3 songs based on query by singing

With the growth of multimedia in the Internet, content analysis of multimedia plays an important role for humanistic management. We investigate the content-based retrieval of MP3 songs based on the interface of query by singing. MDCT (modified DCT) spectral coefficients are directly used to represent the tonic characteristics of a short-term sound. This spectral profile is used for detailed matching between two audio segments. Perceptual features are also computed from MDCT coefficients for audio classification. Two pre-stages based on SVM and k-means classifications are used to remove incorrect (or noisy) segment candidates and to speed up the subsequent matching process. On the other hand, exponential key-scaling schemes and time-warping techniques are developed to overcome key difference and tempo variation between different singers. Experiments show that the retrieval probability of our design can achieve up to 76% among the top 5 out of a total of 114 excerpts in the database.