Perceptual Hashing Algorithm For Speech Content Identification Based On Spectrum Entropy In Compressed Domain

This paper proposes a new perceptual hashing algorithm for speech content identification with compressed domain based on MDCT (Modified Discrete Cosine Transform) Spectrum Entropy. It aims primarily to solve problems of large computational complexity and poor real-time performance that appear when applying traditional identification methods to the compressed speeches. The process begins by extracting the MDCT coefficients, which are the intermediately decoded results of Zhang Qiu-yu, Liu Yang-wei, Huang Yi-bo, Xing Peng-fei and Yang Zhong-ping, PERCEPTUAL HASHING ALGORITHM FOR SPEECH CONTENT IDENTIFICATION BASED ON SPECTRUM ENTROPY IN COMPRESSED DOMAIN 284 compressed speeches in MP3 format. In order to reduce the computational complexity, these coefficients are divided into sub-bands and the energy of MDCT spectrum is then calculated. Subbands of MDCT spectrum energy are then mapped to a similar mass function in information entropy theory. The function will be used as a perceptual feature and set to extract binary hash values. Experimental results show that the proposed algorithm keeps greater robustness to content-preserving operations while also maintaining efficiency. As a result of the partial decoding process, the real-time performance can meet the requirements of applications in real-time communication terminals.

[1]  N. K. Verma,et al.  Smartphone application for fault recognition , 2012, 2012 Sixth International Conference on Sensing Technology (ICST).

[2]  P. Noll,et al.  MPEG digital audio coding , 1997, IEEE Signal Process. Mag..

[3]  Feng Rui A Robust Compressed-Domain Music Fingerprinting Technique Based on MDCT Spectral Entropy , 2010 .

[4]  Frank Kurth,et al.  Perceptual Hashing for the Identification of Telephone Speech , 2012, ITG Conference on Speech Communication.

[5]  Jaap A. Haitsma,et al.  Robust Audio Hashing for Content Identification , 2001 .

[6]  L. Yaroslavsky,et al.  On the relationship between MDCT, SDPT and DFT , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[7]  Xu Xue-qiong Research and realization of speech segmentation in MP3 compressed domain , 2009 .

[8]  Lahouari Ghouti,et al.  A Robust Perceptual Audio Hashing using Balanced Multiwavelets , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Cheng Long Effective Robust Speech Authentication Algorithm Based on Perceptual Characteristics , 2010 .

[10]  Milad Alemzadeh,et al.  Human-Computer Interaction: Overview on State of the Art , 2008 .

[11]  Rosa Lancini,et al.  Audio content identification by using perceptual hashing , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[12]  Hynek Hermansky,et al.  Spectral entropy based feature for robust ASR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Jiao Yu-hua,et al.  An Overview of Perceptual Hashing , 2008 .

[14]  Reginald L. Lagendijk,et al.  On the comparison of audio fingerprints for extracting quality parameters of compressed audio , 2006, Electronic Imaging.

[15]  Yuhua Jiao,et al.  Robust Speech Hashing for Content Authentication , 2009, IEEE Signal Processing Letters.

[16]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[17]  Li Na Compressed Domain Speech Enhancement Based on Gaussian Mixture Model , 2012 .