Source-filter separation of speech signal in the phase domain

In earlier work we proposed a framework for speech source-filter separation that employs phase-based signal processing. This paper presents a further theoretical investigation of the model and optimisations that make the filter and source representations less sensitive to the effects of noise and better matched to downstream processing. To this end, first, in computing the Hilbert transform, the log function is replaced by the generalised logarithmic function. This introduces a tuning parameter that adjusts both the dynamic range and distribution of the phase-based representation. Second, when computing the group delay, a more robust estimate for the derivative is formed by applying a regression filter instead of using sample differences. The effectiveness of these modifications is evaluated in clean and noisy conditions by considering the accuracy of the fundamental frequency extracted from the estimated source, and the performance of speech recognition features extracted from the estimated filter. In particular, the proposed filter-based front-end reduces Aurora-2 WERs by 6.3% (average 0-20 dB) compared with previously reported results. Furthermore, when tested in a LVCSR task (Aurora-4) the new features resulted in 5.8% absolute WER reduction compared to MFCCs without performance loss in the clean/matched condition.

[1]  Hema A. Murthy,et al.  Modified group delay feature based total variability space modelling for speaker recognition , 2015, Int. J. Speech Technol..

[2]  Thierry Dutoit,et al.  Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation , 2011, Speech Commun..

[3]  Hamid Sheikhzadeh,et al.  Phase-Only Speech Reconstruction Using Very Short Frames , 2011, INTERSPEECH.

[4]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[5]  P. Bickel,et al.  An Analysis of Transformations Revisited , 1981 .

[6]  Takao Kobayashi,et al.  Spectral analysis using generalized cepstrum , 1984 .

[7]  Rahim Saeidi,et al.  On phase importance in parameter estimation in single-channel speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Erfan Loweimi,et al.  A new group delay-based feature for robust speech recognition , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[9]  Antonio Baldi,et al.  Phase Unwrapping Algorithms: A Comparison , 2000 .

[10]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[11]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[12]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[13]  T. Gerkmann,et al.  Phase estimation in speech enhancement — Unimportant, important, or impossible? , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[14]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Thierry Dutoit,et al.  Chirp group delay analysis of speech signals , 2007, Speech Commun..

[16]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[18]  Günther Palm,et al.  Effects of phase on the perception of intervocalic stop consonants , 1997, Speech Commun..

[19]  Parham Aarabi,et al.  On the importance of phase in human speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Thomas Drugman,et al.  A new phase-based feature representation for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  R MadikeriSrikanth,et al.  Modified group delay feature based total variability space modelling for speaker recognition , 2015 .

[22]  Kuldip K. Paliwal,et al.  Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  W. Bastiaan Kleijn,et al.  On phase perception in speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[25]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[26]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[27]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[28]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[29]  Vidhyasaharan Sethu,et al.  Group delay features for emotion detection , 2007, INTERSPEECH.

[30]  Kuldip K. Paliwal,et al.  On the usefulness of STFT phase spectrum in human listening tests , 2005, Speech Commun..

[31]  Karthika Vijayan,et al.  Analysis of Phase Spectrum of Speech Signals Using Allpass Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Karthika Vijayan,et al.  Significance of analytic phase of speech signals in speaker verification , 2016, Speech Commun..

[33]  M.R. Schroeder,et al.  Models of hearing , 1975, Proceedings of the IEEE.

[34]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[35]  Kuldip K. Paliwal,et al.  Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..

[36]  Erfan Loweimi,et al.  On the importance of phase and magnitude spectra in speech enhancement , 2011, 2011 19th Iranian Conference on Electrical Engineering.

[37]  Thomas Drugman,et al.  On the Importance of Pre-emphasis and Window Shape in Phase-Based Speech Recognition , 2013, NOLISP.

[38]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[39]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[40]  Yannis Stylianou,et al.  Advances in phase-aware signal processing in speech communication , 2016, Speech Commun..

[41]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[42]  Jonathan Le Roux,et al.  Phase Processing for Single-Channel Speech Enhancement: History and recent advances , 2015, IEEE Signal Processing Magazine.

[43]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[44]  Jon Barker,et al.  Compression of Model-based Group Delay Function for Robust Speech Recognition , 2014 .

[45]  Hermann Ney,et al.  Using phase spectrum information for improved speech recognition performance , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[46]  Erfan Loweimi,et al.  Objective evaluation of phase and magnitude only reconstructed speech: New considerations , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[47]  Lukás Burget,et al.  Parallel training of neural networks for speech recognition , 2010, INTERSPEECH.

[48]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[49]  Kuldip K. Paliwal,et al.  Further intelligibility results from human listening tests using the short-time phase spectrum , 2006, Speech Commun..

[50]  Kuldip K. Paliwal,et al.  Importance of window shape for phase-only reconstruction of speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[52]  Frantisek Grézl,et al.  Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  S. M. Ahadi,et al.  Objective evaluation of magnitude and phase only spectrum-based reconstruction of the Speech signal , 2010, 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[54]  Pejman Mowlaee,et al.  Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement , 2013, IEEE Signal Processing Letters.