Noise Robustness Evaluation of Audio Features in Segment Search

This paper evaluates the noise robustness of audio features in segment search. Active Search is well-known as a fast segment search algorithm, and it has been successfully applied to locate music or video segments (intervals) in huge databases. The noise is generated by MP3 encoding/decoding. The search accuracy is evaluated using F-measure, which is calculated precision and recall. The experiment results show that mel-scaled spectral features are better and have a broader range of search thresholds than linear-scaled features. The low analysis order of the mel-scaled audio features has a search speed that is about 12 times faster, with quite reasonable search accuracy.

[1]  Keiichi Tokuda,et al.  CELP speech coding based on mel‐generalized cepstral analyses , 2000 .

[2]  Kunio Kashino,et al.  A quick video search method based on local and global feature clustering , 2004, ICPR 2004.

[3]  Kunio Kashino,et al.  Quick audio retrieval using active search , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Kunio Kashino,et al.  Probabilistic Dither-Voting: Improving Robustness of Time-Series Active Search with Respect to Feature Distortions , 2001 .