Speech enhancement with ad-hoc microphone array using single source activity

In this paper, we propose a method for synchronizing asynchronous channels in an ad-hoc microphone array based on single source activity for speech enhancement. An ad-hoc microphone array can include multiple recording devices, which do not communicate with each other. Therefore, their synchronization is a significant issue when using the conventional microphone array technique. We here assume that we know two or more segments (typically the beginning and the end of the recording) where only the sound source is active. Based on this situation, we compensate for the difference between the start and end of the recording and the sampling frequency mismatch. We also describe experimental results for speech enhancement with a maximum SNR beamformer.

[1]  Ted S. Wada,et al.  On Dealing with Sampling Rate Mismatches in Blind Source Separation and Acoustic Echo Cancellation , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[2]  Hiroshi Sawada,et al.  Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[4]  Harry L. Van Trees,et al.  Optimum Array Processing , 2002 .

[5]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[6]  Zicheng Liu SOUND SOURCE SEPARATION WITH DISTRIBUTED MICROPHONE ARRAYS IN THE PRESENCE OF CLOCK SYNCHRONIZATION ERRORS , 2008 .

[7]  Shigeki Sagayama,et al.  Blind Estimation of Locations and Time Offsets for Distributed Recording Devices , 2010, LVA/ICA.

[8]  Israel Cohen,et al.  Blind Sampling Rate Offset Estimation and Compensation in Wireless Acoustic Sensor Networks with Application to Beamforming , 2012, IWAENC.

[9]  Shoji Makino,et al.  Blind compensation of inter-channel sampling frequency mismatch with maximum likelihood estimation in STFT domain , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[11]  Nobutaka Ito,et al.  Blind alignment of asynchronously recorded signals for distributed microphone array , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.