Nontactile estimation of glottal excitation characteristics of voiced speech

A record of voiced speech is LPCanalysed. It is also partitioned into a sequence of short signals, each containing a single glottal pulse. Each short signal is taken to be the convolution of a component varying from one short signal to the next and an invariant component, corrupted by a significant but not overwhelming contamination, i.e. noise plus all other imperfections. The invariant component, which is initially estimated by shift-and-add processing, is the multiple convolution of the invariant responses of the recording apparatus, the speaker's lips and vocal tract (plus nasal tract and soft palate) and the speaker's average glottal excitation. This initial estimate, which is characteristic of the glottal excitation, is iteratively refined by a computational procedure which makes use of the LPC coefficients. The procedure, which checks its own numerical convergence, is illustrated by presenting results for six different speakers and for a single speaker under varying conditions.

[1]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[2]  Paul H. Milenkovic,et al.  Glottal inverse filtering by joint estimation of an AR system with a linear input model , 1986, IEEE Trans. Acoust. Speech Signal Process..

[3]  W. Tucker,et al.  A pitch estimation algorithm for speech and music , 1978 .

[4]  Kenneth N. Stevens,et al.  Vocal fold physiology , 1981 .

[5]  S. Feijóo,et al.  Automatic determination of tone period and evaluation of dysphony in pathological voices , 1986 .

[6]  R.H.T. Bates,et al.  Image Restoration and Construction , 1986 .

[7]  D. Veeneman,et al.  Automatic glottal inverse filtering from speech and electroglottographic signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[9]  S. Knorr Reliable voiced/Unvoiced decision , 1979 .

[10]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[11]  R. H. T. Bates,et al.  Ultrasonic Transmission Speckle Imaging , 1981 .

[12]  Kenneth Steiglitz,et al.  The use of time-domain selection for improved linear prediction , 1977 .

[13]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[14]  R. Miller Nature of the Vocal Cord Wave , 1956 .

[15]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[16]  H. Mcalister,et al.  THE TRUE NODAL QUADRANT OF CAPELLA. , 1983 .

[17]  P. Morse,et al.  Methods of theoretical physics , 1955 .

[18]  R. Bates,et al.  Towards true imaging by wideband speckle interferometry , 1980 .

[19]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[20]  Ian H. Witten Principles of computer speech , 1982 .

[21]  R. A. Minard,et al.  Full-wave computed tomography. Part 3: Coherent shift-and-add imaging , 1985 .

[22]  R. Bates Astronomical speckle imaging , 1982 .

[23]  Erez N. Ribak,et al.  Images From Astronomical Speckle Data: Weighted Shift-and-Add Analysis , 1985, Optics & Photonics.

[24]  J. Holmes,et al.  The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer , 1973 .

[25]  D. Childers,et al.  Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[26]  D. Friedman,et al.  Pseudo-maximum-likelihood speech pitch extraction , 1977 .