Blind channel identification for speech dereverberation using l1-norm sparse learning

Speech dereverberation remains an open problem after more than three decades of research. The most challenging step in speech dereverberation is blind channel identification (BCI). Although many BCI approaches have been developed, their performance is still far from satisfactory for practical applications. The main difficulty in BCI lies in finding an appropriate acoustic model, which not only can effectively resolve solution degeneracies due to the lack of knowledge of the source, but also robustly models real acoustic environments. This paper proposes a sparse acoustic room impulse response (RIR) model for BCI, that is, an acoustic RIR can be modeled by a sparse FIR filter. Under this model, we show how to formulate the BCI of a single-input multiple-output (SIMO) system into a l1-norm regularized least squares (LS) problem, which is convex and can be solved efficiently with guaranteed global convergence. The sparseness of solutions is controlled by l1-norm regularization parameters. We propose a sparse learning scheme that infers the optimal l1-norm regularization parameters directly from microphone observations under a Bayesian framework. Our results show that the proposed approach is effective and robust, and it yields source estimates in real acoustic environments with high fidelity to anechoic chamber measurements.

[1]  Tomohiro Nakatani,et al.  One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals , 2003, NIPS.

[2]  Dmitry M. Malioutov,et al.  Homotopy continuation for sparse signal representation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Donald L. Duttweiler,et al.  Proportionate normalized least-mean-squares adaptation in echo cancelers , 2000, IEEE Trans. Speech Audio Process..

[4]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[5]  Daniel D. Lee,et al.  Bayesian L1-Norm Sparse Learning , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[7]  Lang Tong,et al.  Blind identification and equalization based on second-order statistics: a time domain approach , 1994, IEEE Trans. Inf. Theory.

[8]  Li Deng,et al.  Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[9]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[10]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[11]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[12]  Erkki Oja,et al.  Independent Component Analysis , 2001 .