Short-term sound stream characterization for reliable, real-time occurrence monitoring of given sound-prints

Identification in real-time streaming data using plain template-matching algorithms is difficult due to the undefined frame position in the on-line data compared to the frame-based features' position of the templates. We have investigated the use of the short-time Fourier-spectrum, the short-time Walsh-spectrum and the short-time signal energy and the ratio of the succeeding features from the frame-shift point of view. The last two features aim to improve the calculation speed in larger sets of records. For further simplifying operations in the comparison stage, a quantization step was applied to the spectrum values which resulted in ternary-logic time-frequency maps. This is also useful for eliminating the effects of non-extraordinary spectral-shape distortions by utilizing the prominent parts of the spectrum. An algorithm was developed for selecting the most suitable segment combination of sound records to be monitored where the differences between all segment pairs are largest. The method was applied to identify advertisements on the Real Audio broadcast of Hungarian Radio.

[1]  Faouzi Kossentini,et al.  Audio coding using variable-depth multistage quantization , 1998, IEEE Trans. Speech Audio Process..

[2]  S. R. Subramanya,et al.  Transform-based indexing of audio data for multimedia databases , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[3]  James R. Glass,et al.  A comparative study of signal representations and classification techniques for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  F. Shum,et al.  Speech processing with Walsh-Hadamard transforms , 1973 .

[5]  J. G. Lourens Detection and Logging Advertisements using its Sound , 1990, IEEE South African Symposium on Communications and Signal Processing.

[6]  Wolfgang Effelsberg,et al.  On the detection and recognition of television commercials , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[7]  Robert B. Randall Frequency Analysis , 1987 .