A user voice reduction algorithm based on binaural signal separation for portable digital imaging devices

In this paper, a user voice reduction algorithm for portable digital imaging devices is proposed based on a binaural signal separation approach in order to improve the naturalness of user-generated video contents. The proposed algorithm first estimates the interaural time differences (ITDs) from binaural signals recorded by the microphones equipped on a device. Then, the estimated ITDs are used to obtain the time-frequency domain masking patterns of a user voice against an actual subject sound of video content. Finally, the user voice recorded in video content can be reduced by applying the mask patterns to the binaural signals. In order to demonstrate the effectiveness of the proposed algorithm, the proposed algorithm is implemented on a portable digital imaging device having a clock speed of 600 MHz. It is shown from the performance evaluation by measuring a sound pressure level that the proposed algorithm reduces user voice by around 10 dB.

[1]  Dae Hee Youn,et al.  Software optimization of the MPEG-audio decoder using a 32-bit MCU RISC processor , 2002, 2002 Digest of Technical Papers. International Conference on Consumer Electronics (IEEE Cat. No.02CH37300).

[2]  Sang Ryong Kim,et al.  A voice-driven scene-mode recommendation service for portable digital imaging devices , 2009 .

[3]  Jongsoo Choi,et al.  A simple and efficient color recovering system for content sharing website , 2010, IEEE Transactions on Consumer Electronics.

[4]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[5]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[6]  Sang Ryong Kim,et al.  A smart background music mixing algorithm for portable digital imaging devices , 2011, IEEE Transactions on Consumer Electronics.

[7]  Hong Kook Kim,et al.  HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[8]  King Ngi Ngan,et al.  Implementation of H.264 on Mobile Device , 2007, IEEE Transactions on Consumer Electronics.

[9]  Rainer Martin,et al.  Temporal smoothing of spectral masks in the cepstral domain for speech separation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.