Real-time distributed speech enhancement with two collaborating microphone arrays

In this demonstration, we aim at presenting our recent implementation results and provide an evaluation testbed through which users can experiment and compare the outputs of the distributed speech enhancement algorithms in [1–3]. The system allows a user to assess the merits of these algorithms in any acoustic setup. The multi-channel Wiener filter (MWF) is a well-known noise reduction algorithm for multi-microphone speech processing applications. In general, the noise reduction improves as the number of available microphones increases, since a better spatial sampling or diversity can be exploited. Motivated by this, wireless acoustic sensor networks (WASNs), consisting of a multitude of collaborating nodes with an embedded signal processing unit and microphone array, have been proposed to increase the spatial diversity of multi-microphone systems. However, due to the limited per-node computational power and communication bandwidth, reduced-bandwidth distributed processing is more favorable than a centralized processing where all the microphone signals are transmitted to a fusion center. In this demo, we evaluate the so-called distributed adaptive node-specific signal estimation (DANSE) algorithm [1] which is essentially a distributed realization of the MWFs of the individual nodes of a WASN and allows the nodes to cooperate by exchanging pre-filtered and compressed signals, while eventually converging to the same centralized MWF solutions as if each node would have access to all the microphone signals in theWASN [1,2]. In the original version of DANSE in [1], the required speech correlation matrices are estimated using a straightforward subtraction-based method. This method, however, has been shown to deliver an unsatisfying performance in the presence of second-order statistics error (e.g., due to low-SNR conditions, highly non-stationary noise or erroneous voice activity detections (VADs)) [4]. An alternative version of DANSE, called generalized eigenvalue decomposition (GEVD)-based DANSE, has been developed in [3] in which each node incorporates a GEVD-based lowrank approximation of the speech correlation matrix in its local MWF. An in-depth theoretical study of the underlying principals of the GEVD-based DANSE algorithm has been presented in [3]. In order to also evaluate the merits of the GEVD-based DANSE algortihm in a practical realistic environment, a real-time experimental setup has been developed which will be explained in the next section.