论文信息 - Analysis of the robustness of neural network-based target activity detection

Analysis of the robustness of neural network-based target activity detection

Many applications in audio signal processing require a precise identification of time frames where a predefined target source is active. In previous work, Artificial Neural Networks (ANNs) with crosscorrelation features showed a considerable potential in this field. In this paper, the performance of ANN-based target activity detection is analyzed in more detail and compared with a well-performing "classical" signal processing method. On the one hand, the impact of the angular distance between target source and interferers is evaluated for both the neural network-based method and the classical one. On the other hand, the sensitivity of both methods to varying Signal-to-Noise Ratio (SNR) conditions is analyzed with respect to the importance of a proper choice of detection thresholds. In the evaluations, the ANN-based method proves its general superiority and also its robustness with respect to a non-ideal choice of detection thresholds.

Walter Kellermann | Stefan Meier | Daniel Gerber

[1] Bobby R. Hunt,et al. Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[2] Sriram Srinivasan,et al. Spatial audio activity detection for hearing aids , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Jun Du,et al. A universal VAD based on jointly trained deep neural networks , 2015, INTERSPEECH.

[4] Gerhard Schmidt,et al. Improved Performance Measures for Voice Activity Detection , 2014, ITG Symposium on Speech Communication.

[5] Ashish Koul,et al. Using Intermicrophone Correlation to Detect Speech in Spatially Separated Noise , 2006, EURASIP J. Adv. Signal Process..

[6] Walter Kellermann,et al. Efficient target activity detection based on recurrent neural networks , 2017, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA).

[7] John H. L. Hansen,et al. An efficient microphone array based voice activity detector for driver's speech in noise and music rich in-vehicle environments , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Emanuel A. P. Habets,et al. Minimum Bayes risk signal detection for speech enhancement based on a narrowband DOA model , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Walter Kellermann,et al. An Acoustic Human-Machine Front-End for Multimedia Applications , 2003, EURASIP J. Adv. Signal Process..

[10] Yuki Denda,et al. Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation , 2007, INTERSPEECH.

[11] Tetsuya Ogata,et al. Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Zhao Li,et al. GSC-based spatial voice activity detection for enhanced speech coding in the presence of competing speech , 2001, IEEE Trans. Speech Audio Process..

[13] Ehud Weinstein,et al. Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[14] Jacob Benesty,et al. Gaussian Model-Based Multichannel Speech Presence Probability , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15] Walter Kellermann,et al. Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection , 2016, INTERSPEECH.

[16] B. Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[17] Ilyas Potamitis,et al. Speech activity detection and enhancement of a moving speaker based on the wideband generalized likelihood ratio and microphone arrays. , 2004, The Journal of the Acoustical Society of America.

[18] Colin Raffel,et al. Lasagne: First release. , 2015 .

[19] Nam Ik Cho,et al. Voice activity detection using the phase vector in microphone array , 2007, INTERSPEECH.

[20] Boaz Rafaely,et al. Design of Pseudo-Spherical Microphone Array with Extended Frequency Range for Robot Audition , 2016 .

[21] Régine Le Bouquin-Jeannès,et al. Study of a voice activity detector and its influence on a noise reduction system , 1995, Speech Commun..

[22] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[23] Dongsuk Yook,et al. Space-time voice activity detection , 2009, IEEE Transactions on Consumer Electronics.

[24] Afsaneh Asaei,et al. An integrated framework for multi-channel multi-source localization and voice activity detection , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[25] DeLiang Wang,et al. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26] Emanuel A. P. Habets,et al. Noise Reduction in the Spherical Harmonic Domain Using a Tradeoff Beamformer and Narrowband DOA Estimates , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27] Walter Kellermann,et al. Relative impulse response estimation during doubletalk with an artificial neural network-based step size control , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[28] Gerhard Schmidt,et al. Features for voice activity detection: a comparative analysis , 2015, EURASIP J. Adv. Signal Process..

[29] Yuki Denda,et al. Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation , 2006, IEICE Trans. Inf. Syst..