The results of our recent human perception experiments indicate that the short-time phase spectrum can significantly contribute to speech intelligibility over small window durations (i.e., 20-40 ms). This motivates us to investigate the use of the short-time phase spectrum to derive features for automatic speech recognition, which generally uses small window durations of 20-40 ms for spectral analysis. In this paper, we specifically investigate the frequency-derivative of the short-time phase spectrum (i.e., group delay function, GDF) from which to extract features. We demonstrate, with some simple examples, the volatility of the GDF to noise, pitch epochs and windowing effects. We summarise the work by Yegnanarayana and Murthy on the modified GDF (MGDF), which serves to remedy the problems of the GDF. We then implement Murthy and Gadde's MGDF-based features (MODGDF) to determine if they provide an improvement over the popular MFCC representation either by themselves or in combination with MFCCs on an isolated word recognition task.
[1]
Alan V. Oppenheim,et al.
Digital Signal Processing
,
1978,
IEEE Transactions on Systems, Man, and Cybernetics.
[2]
A.V. Oppenheim,et al.
The importance of phase in signals
,
1980,
Proceedings of the IEEE.
[3]
Bayya Yegnanarayana,et al.
Significance of group delay functions in spectrum estimation
,
1992,
IEEE Trans. Signal Process..
[4]
Hema A. Murthy,et al.
The modified group delay function and its application to phoneme recognition
,
2003,
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[5]
Kuldip K. Paliwal,et al.
On the usefulness of STFT phase spectrum in human listening tests
,
2005,
Speech Commun..