Double-Cross-Correlation Processing for Blind Sampling-Rate and Time-Offset Estimation

Coherent processing of signals captured by a wireless acoustic sensor network (WASN) requires an estimation of unknown parameters such as the sampling-rate and sampling-time offset (SRO and STO) of asynchronous sensor clocks. Although some sophisticated techniques for blind parameter estimation have become available in this young field of research, further development of new methods is required especially regarding environmental robustness, online estimation, and computational efficiency. As the main contribution of this work, we therefore introduce a novel approach for blind SRO estimation in the spirit of both a recently available time-domain double-cross-correlation processor (DXCP) and the well-known generalized cross-correlation with phase transform (GCC-PhaT). For the proposed approach, called DXCP-PhaT, we specifically introduce secondary cross-quantities to restore ergodicity of the otherwise drifting cross-correlation function or cross-spectrum of asynchronous input signals, which is an important property for estimation. Based upon this theoretical contribution of the paper, we then derive the DXCP-PhaT algorithm in the STFT domain, which brings along a number of improvements over state-of-the-art: advanced accuracy in terms of SRO assessment, environmental robustness to larger microphone distances as of real sensor networks, realtime-applicability in terms of online architecture and reduced complexity, and eventually an extension with online STO estimation (up to the inherent ambiguity of digital STO and acoustic TDOA) reliant on SRO compensation. Those claims are confirmed by comprehensive experiments, both offline and online, on simulated data and real recordings from a Rasberry-Pi-based WASN.

[1]  Gerald Enzner,et al.  Low-Rate Farrow Structure with Discrete-Lowpass and Polynomial Support for Audio Resampling , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[2]  Reinhold Haeb-Umbach,et al.  Evaluation of Modulation-MFCC Features and DNN Classification for Acoustic Event Detection , 2018 .

[3]  Shoji Makino,et al.  Blind compensation of inter-channel sampling frequency mismatch with maximum likelihood estimation in STFT domain , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Shoko Araki,et al.  Estimation of Sampling Frequency Mismatch between Distributed Asynchronous Microphones under Existence of Source Movements with Stationary Time Periods Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Ted S. Wada,et al.  On Dealing with Sampling Rate Mismatches in Blind Source Separation and Acoustic Echo Cancellation , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[6]  Andreas Stolcke,et al.  Meeting Transcription Using Virtual Microphone Arrays , 2019, ArXiv.

[7]  Gerald Enzner,et al.  Fast and Accurate Audio Resampling for Acoustic Sensor Networks by Polyphase-Farrow Filters with FFT Realization , 2018, ITG Symposium on Speech Communication.

[8]  Sharon Gannot,et al.  Blind synchronization in wireless sensor networks with application to speech enhancement , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[9]  Joshua D. Reiss,et al.  Self-Localization of Ad-Hoc Arrays Using Time Difference of Arrivals , 2016, IEEE Transactions on Signal Processing.

[10]  Walter Kellermann,et al.  Synchronization of acoustic sensors for distributed ad-hoc audio networks and its use for blind source separation , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[11]  Marc Moonen,et al.  Blind Sampling Rate Offset Estimation for Wireless Acoustic Sensor Networks Through Weighted Least-Squares Coherence Drift Estimation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Jacob Benesty,et al.  A Framework for Speech Enhancement With Ad Hoc Microphone Arrays , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Alexander Bertrand,et al.  Special issue on wireless acoustic sensor networks and ad hoc microphone arrays , 2015, Signal Process..

[14]  Richard C. Hendriks,et al.  On clock synchronization for multi-microphone speech processing in wireless acoustic sensor networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Marc Moonen,et al.  Robust Distributed Noise Reduction in Hearing Aids with External Acoustic Sensor Nodes , 2009, EURASIP J. Adv. Signal Process..

[16]  K. Nakadai,et al.  Synchronization of Microphones Based on Rank Minimization of Warped Spectrum for Asynchronous Distributed Recording , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  C. K. Yuen,et al.  Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Shoko Araki,et al.  Meeting recognition with asynchronous distributed microphone array , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[19]  D.H. Johnson,et al.  The Signal Processing Information Base , 1993, IEEE Signal Processing Magazine.

[20]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[21]  Shoji Makino,et al.  Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation , 2015, Signal Process..

[22]  Reinhold Haeb-Umbach,et al.  Insights into the Interplay of Sampling Rate Offsets and MVDR Beamforming , 2018, ITG Symposium on Speech Communication.

[23]  Alexander Bertrand,et al.  Applications and trends in wireless acoustic sensor networks: A signal processing perspective , 2011, 2011 18th IEEE Symposium on Communications and Vehicular Technology in the Benelux (SCVT).

[24]  Reinhold Häb-Umbach,et al.  MARVELO - A Framework for Signal Processing in Wireless Acoustic Sensor Networks , 2018, ITG Symposium on Speech Communication.

[25]  Rainer Lienhart,et al.  Position calibration of microphones and loudspeakers in distributed computing platforms , 2005, IEEE Transactions on Speech and Audio Processing.

[26]  Reinhold Haeb-Umbach,et al.  Privacy-Preserving Variational Information Feature Extraction for Domestic Activity Monitoring versus Speaker Identification , 2019, INTERSPEECH.

[27]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[28]  W. Grassman Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory (Harold J. Kushner) , 1986 .

[29]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[30]  Marc Moonen,et al.  Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks , 2015, Signal Process..

[31]  Marc Moonen,et al.  Blind sampling rate offset estimation based on coherence drift in wireless acoustic sensor networks , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[32]  Keiko Ochi,et al.  Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage , 2016, INTERSPEECH.

[33]  Israel Cohen,et al.  Blind Sampling Rate Offset Estimation and Compensation in Wireless Acoustic Sensor Networks with Application to Beamforming , 2012, IWAENC.

[34]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[35]  Yutaka Matsuo,et al.  Privacy Issues Regarding the Application of DNNs to Activity-Recognition using Wearables and Its Countermeasures by Use of Adversarial Training , 2017, IJCAI.

[36]  Walter Kellermann,et al.  Learning-Based Acoustic Source-Microphone Distance Estimation Using the Coherent-to-Diffuse Power Ratio , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Athanasios Mouchtaris,et al.  Localizing multiple audio sources in a wireless acoustic sensor network , 2015, Signal Process..

[38]  Lin Wang,et al.  Correlation Maximization-Based Sampling Rate Offset Estimation for Distributed Microphone Arrays , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  Walter Kellermann,et al.  Localization of multiple simultaneously active sources in acoustic sensor networks using ADP , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[40]  Yusuke Hioka,et al.  Distributed blind source separation with an application to audio signals , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Augusto Sarti,et al.  Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[42]  Sharon Gannot,et al.  Performance analysis of MVDR beamformer in WASN with sampling rate offsets and blind synchronization , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[43]  Minerva M. Yeung,et al.  On the importance of exact synchronization for distributed audio signal processing , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[44]  Sharon Gannot,et al.  Blind Synchronization in Wireless Acoustic Sensor Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[45]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[46]  Reinhold Häb-Umbach,et al.  A study on transfer learning for acoustic event detection in a real life scenario , 2017, 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP).

[47]  Deborah Estrin,et al.  Coherent acoustic array processing and localization on wireless sensor networks , 2003, Proc. IEEE.

[48]  Reinhold Häb-Umbach,et al.  Multi-stage coherence drift based sampling rate synchronization for acoustic beamforming , 2017, 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP).

[49]  Shoji Makino,et al.  Optimizing frame analysis with non-integrer shift for sampling mismatch compensation of long recording , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[50]  Zicheng Liu SOUND SOURCE SEPARATION WITH DISTRIBUTED MICROPHONE ARRAYS IN THE PRESENCE OF CLOCK SYNCHRONIZATION ERRORS , 2008 .

[51]  Gerald Enzner,et al.  A Double-cross-correlation Processor for Blind Sampling Rate Offset Estimation in Acoustic Sensor Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Rainer Martin,et al.  Classification of reverberant audio signals using clustered ad hoc distributed microphones , 2015, Signal Process..

[53]  Matti S. Hämäläinen,et al.  Passive Temporal Offset Estimation of Multichannel Recordings of an Ad-Hoc Microphone Array , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[54]  I. Cohen,et al.  Generating nonstationary multisensor signals under a spatial coherence constraint. , 2008, The Journal of the Acoustical Society of America.

[55]  Gerald Enzner,et al.  Tracking Theory of Adaptive Filters with Input-Output Sampling Rate Offset , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).