Speech Detection Based on Hilbert-Huang Transform

Under strong noise environments, the speech detection often performs bad, in order to make some improvements the Hilbert-Huang transform is used in the algorithm. The speech signal is decomposed into finite intrinsic mode functions, and then, with the Hilbert transform, the energy-frequency-time distribution of the original signal can be obtained. The EMD is used as a filter to remove unwanted noise, and then the feature was extracted to detect speech frames by investigating the distribution of energy depending on the time. Experiments show HHT is helpful to extract the characteristic parameters of the signals, and also is capable to improve the performance of speech detection

[1]  N. Huang,et al.  A study of the characteristics of white noise using the empirical mode decomposition method , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[2]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[3]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[4]  S. S. Shen,et al.  A confidence limit for the empirical mode decomposition and Hilbert spectral analysis , 2003, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.