Foreground Speech Segmentation using Zero Frequency Filtered Signal

A method for the robust segmentation of f reground speechin the presence of background degradationusing zero frequency filtered signal (ZFFS) is proposed. The speech signal from the desired speaker collected over a mobile phone is termed as foreground speechand the acoustic background picked by the same sensor that includes both speech and non-speech sources is termed asbackground degradation . The zero frequency filtering (ZFF) of speech allows only information around the zero frequency to pass through. The features from the resulting ZFFS, namely, the normalized first order autocorrelation coefficient and the strength of excitation of ZFFS are observed to be different for foreground speech and background degradation. A method for foreground speech segmentation is developed using these two features. The evaluation using utterances containing isolated words of foreground speech and background degradation collected in a real environment shows a robust foreground speech segmentation.

[1]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[2]  Bayya Yegnanarayana,et al.  Characterization of Glottal Activity From Speech Signals , 2009, IEEE Signal Processing Letters.

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[5]  P. Krishnamoorthy,et al.  Reverberant Speech Enhancement by Temporal and Spectral Processing , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  S. R. M. Prasanna,et al.  Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  S. R. Mahadeva Prasanna,et al.  Extraction of pitch in adverse conditions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  S. R. Mahadeva Prasanna,et al.  Two speaker speech separation by LP residual weighting and harmonics enhancement , 2010, Int. J. Speech Technol..

[9]  Douglas D. O'Shaughnessy,et al.  Speech enhancement based conceptually on auditory evidence , 1991, IEEE Trans. Signal Process..

[10]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.