ALGORITHMS FOR PROCESSING FOURIER TRANSFORM PHASE OF SIGNALS

The studies presented in this thesis represent an attempt to process the Fourier transform (FT) phase of signals for feature extraction. Although the FT magnitude and phase spectra are independent functions of frequency features of a signal, most techniques for feature extraction from a signal are bked upon manipulating the the FT magnitude only. The phase spectrum of the signal corresponds to time delay corresponding to each of the sinusoidal components of the signal. In the context of additive noise, the time delay may not be significantly corrupted and the phase spectrum might be considered to be a more reliable source for estimating the features in a noisy signal. Although the importance of phase in signals is realised by researchers, very few attempts have been made to process the FT phase of signals for the extraction of features. Features of a signal, for example, resonance information, is completely masked by the inevitable wrapping of the phase spectrum. An alternative to processing the phase spectrum is processing the group delay function. The group delay function is the negative derivative of the (unwrapped) FT phase spectrum. The group delay function can be computed directly from the time domain signal.The group delay function possesses additive and high resolution properties, in that it shows a squared magnitude behaviour in the vicinity of a resonance. But the group delay function in general is not well behaved for all classes of signals. Zeros in the z-transform of a signal that are close to the unit circle cause large amplitude spikes to appear in the group delay function. The polarity of a spike depends on the location of the zero with respect to the unit circle. These large amplitude spikes mask the information about resonances. The research effort in this thesis focusses on the development of algorithms for manipulating the group delay function to suppress the information corresponding to the zeros of th signal that are close to unit circle in the z-domain and emphasise the features of of a signal. To demonstrate the usefulness of the algorithms developed, these algorithms are used to estimate (a) formant and pitch data from speech signals and ( b ) estimate spectra of auto-regressive processes and sinusoids in noise. The research effort in this thesis shows that the phase spectrum (or rather the group delay function) of a signal can be usefully processed to reliably extract features of a signal. ACKNOWLEDCEMENT I express my appreciation to Prof.B.Yegnanarayana for his constant help, excellent guidance and constructive criticisms throughout the course of this work. I thank Prof. R. ~a~arajan, Head, Department of Computer Science and Engineering, for making the various facilities in the department available to me: I owe my special thanks to Madhu Murthy and C.P.Mariadassou for some fruitful discussions. I thank G. V. Ramana Rao and R. Ramaseshan for reading my thesis and making useful suggestions. I would like to thank all my colleagues of the Speech and Vision Lab who have helped me in one way or the other. I thank Vatsala for providing me a shoulder whenever I was depressed. Finally, I thank my husband M. V. N. Murthy for his support and perseverence throughout the course of this work.

[1]  Alan V. Oppenheim,et al.  Evaluation of an adaptive comb filtering method for enhancing speech degraded by white noise addition , 1978 .

[2]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[3]  Bayya Yegnanarayana,et al.  Speech processing using group delay functions , 1991, Signal Process..

[4]  Jr. T. Quatieri Minimum and mixed phase speech analysis-synthesis by adaptive homomorphic deconvolution , 1979 .

[5]  B. Yegnanarayana,et al.  Formant extraction from phase using weighted group delay function , 1989 .

[6]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[7]  Chin-Hui Lee Robust linear prediction for speech analysis , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  J. Picone,et al.  Robust pitch determination via SVD based cepstral methods , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[10]  F. Itakura,et al.  A statistical method for estimation of speech spectral density and formant frequencies , 1970 .

[11]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[12]  B. Yegnanarayana,et al.  Applications of Group Delay Functions in Speech Processing , 1988 .

[13]  G. Rigoll A new algorithm for estimation of formant trajectories directly from the speech signal based on an extended Kalman-filter , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  M. Sondhi,et al.  New methods of pitch extraction , 1968 .

[15]  B. Yegnanarayana,et al.  Signal reconstruction from partial data for sensor array imaging applications , 1990 .

[16]  A. Oppenheim,et al.  Homomorphic analysis of speech , 1968 .

[17]  A. Berkhout,et al.  RELATED PROPERTIES OF MINIMUM‐PHASE AND ZERO‐PHASE TIME FUNCTIONS * , 1974 .

[18]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[20]  Alan V. Oppenheim,et al.  Enhancement of speech by adaptive filtering , 1976, ICASSP.

[21]  M. E. Jernigan,et al.  Nonlinear multiplicative cepstral analysis for pitch extraction in speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[22]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  M. Portnoff Short-time Fourier analysis of sampled speech , 1981 .

[24]  Louis A. Romero,et al.  A Cellular Automata Method for Phase Unwrapping , 1986, Topical Meeting On Signal Recovery and Synthesis II.

[25]  José Tribolet,et al.  A new phase unwrapping algorithm , 1977 .

[26]  Yifan Gong,et al.  Time domain harmonic matching pitch estimation using time-dependent speech modeling , 1987, IEEE Trans. Acoust. Speech Signal Process..

[27]  M A Jack,et al.  Pole Focusing: A New Approach to LPC Analysis offering Superior Noise Robustness and Feature Resolution , 1988 .

[28]  Gary E. Kopec Formant tracking using hidden Markov models and vector quantization , 1986, IEEE Trans. Acoust. Speech Signal Process..

[29]  B. Yegnanarayana,et al.  Effectiveness of representation of signals Througti group delay functions , 1987 .

[30]  David Malah,et al.  Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals , 1979 .

[31]  H. K. Dunn Methods of Measuring Vowel Formant Bandwidths , 1961 .

[32]  Anthony G. Constantinides,et al.  Speech spectral segmentation for spectral estimation and formant modelling , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Chong Un,et al.  Pole-zero modeling of speech based on high-order pole model fitting and decomposition method , 1983 .

[34]  J. Schroeter,et al.  Speech parameter estimation using a vocal tract/Cord model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[36]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[37]  Bishnu S. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1978, ICASSP.

[38]  A. J. Berkhout ON THE MINIMUM‐LENGTH PROPERTY OF ONE‐SIDED SIGNALS , 1973 .

[39]  B. Yegnanarayana,et al.  Significance of group delay functions in signal reconstruction from spectral magnitude or phase , 1984 .

[40]  B. Yegnanarayana,et al.  Processing of noisy speech using modified group delay functions , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[41]  B. Yegnanarayana Speech analysis by pole-zero decomposition of short-time spectra , 1981 .

[42]  Ronald N. Bracewell,et al.  The Fourier Transform and Its Applications , 1966 .

[43]  C L Patterson,et al.  Design of ARMA Digital Filters by Pole-Zero Decomposition , .

[44]  Alan V. Oppenheim,et al.  Reconstruction of Multidimensional Signals from Zero Crossings* , 1986, Topical Meeting On Signal Recovery and Synthesis II.

[45]  Bayya Yegnanarayana,et al.  Processing of noisy speech using partial phase , 1987, ECST.

[46]  H. Fujisaki,et al.  System identification of the speech production process based on a state-space representation , 1984 .

[47]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[48]  James L. Flanagan Automatic Extraction of Formant Frequencies from Continuous Speech , 1955 .

[49]  Jae S. Lim Spectral root homomorphic deconvolution system , 1979, ICASSP.

[50]  B. Yegnanarayana,et al.  Reconstruction from Fourier transform phase with applications to speech analysis , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  R.N. Bracewell,et al.  Signal analysis , 1978, Proceedings of the IEEE.

[52]  S. McCandless,et al.  An algorithm for automatic formant extraction using linear prediction spectra , 1974 .

[53]  Elliot N. Pinson,et al.  Pitch‐Synchronous Time‐Domain Estimation of Formant Frequencies and Bandwidths , 1962 .

[54]  G. Kopec A family of formant trackers based on hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  J. Markel Digital inverse filtering-a new tool for formant trajectory estimation , 1972 .

[56]  L. Rabiner,et al.  System for automatic formant analysis of voiced speech. , 1970, The Journal of the Acoustical Society of America.

[57]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[58]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[59]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[60]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[61]  Bayya Yegnanarayana,et al.  Waveform estimation using group delay processing , 1985, IEEE Trans. Acoust. Speech Signal Process..

[62]  M. Manry Signal processing using implicit phase , 1985 .

[63]  M. Portnoff,et al.  Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[64]  A. Oppenheim,et al.  Signal reconstruction from signed Fourier transform magnitude , 1983 .

[65]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[66]  Gunnar Fant,et al.  On the acoustics of speech , 1958 .

[67]  R. McAulay Maximum likelihood spectral estimation and its application to narrow-band speech coding , 1984 .

[68]  James R. Fienup,et al.  Reconstruction of a complex-valued object from the modulus of its Fourier transform using a support constraint , 1987 .

[69]  Bayya Yegnanarayana,et al.  Speech enhancement using group delay functions , 1990, ICSLP.

[70]  M. Swamy,et al.  Derivative of phase spectrum of truncated autoregressive signals , 1985 .

[71]  Bayya Yegnanarayana,et al.  Formant extraction from Fourier transform phase , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[72]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[73]  J. Makhoul,et al.  Discrete all-pole modeling for voiced speech , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[74]  Bayya Yegnanarayana,et al.  Formant extraction from group delay function , 1991, Speech Commun..

[75]  N. Miller Pitch detection by data reduction , 1975 .