Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech

Almost all speech contains simultaneous contributions from more than one acoustic source within the speaker's vocal tract. In this paper, we propose a method-the pitch-scaled harmonic filter (PSHF)-which aims to separate the voiced and turbulence-noise components of the speech signal during phonation, based on a maximum likelihood approach. The PSHF outputs periodic and aperiodic components that are estimates of the respective contributions of the different types of acoustic source. It produces four reconstructed time series signals by decomposing the original speech signal, first, according to amplitude, and then according to power of the Fourier coefficients. Thus, one pair of periodic and aperiodic signals is optimized for subsequent time-series analysis, and another pair for spectral analysis. The performance of the PSHF algorithm is tested on synthetic signals, using three forms of disturbance (jitter, shimmer and additive noise), and the results were used to predict the performance on real speech. Processing recorded speech examples elicited latent features from the signals, demonstrating the PSHF's potential for analysis of mixed-source speech.

[1]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[2]  Alan V. Oppenheim,et al.  Evaluation of an adaptive comb filtering method for enhancing speech degraded by white noise addition , 1978 .

[3]  Marvin H. J. Guber Bayesian Spectrum Analysis and Parameter Estimation , 1988 .

[4]  J Hillenbrand,et al.  A methodological study of perturbation and additive noise in synthetically generated voice signals. , 1987, Journal of speech and hearing research.

[5]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[6]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.

[7]  Jae S. Lim,et al.  Speech enhancement based on the generalized dual excitation model with adaptive analysis window , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  David M. Howard,et al.  Instantaneous voice period measurement for cochlear stimulation , 1983 .

[9]  Bayya Yegnanarayana,et al.  Decomposition of speech signals into deterministic and stochastic components , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  David L. Donoho,et al.  Nonlinear Wavelet Methods for Recovery of Signals, Densities, and Spectra from Indirect and Noisy Da , 1993 .

[11]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[12]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[13]  Robert I. Damper,et al.  Separation of speech from simultaneous talkers , 1995 .

[14]  Robert Boorstyn,et al.  Single tone parameter estimation from discrete-time observations , 1974, IEEE Trans. Inf. Theory.

[15]  J Lebacq,et al.  Acoustic, perceptual, aerodynamic and anatomical correlations in voice pathology. , 1996, ORL; journal for oto-rhino-laryngology and its related specialties.

[16]  Nancy Hubing,et al.  Dynamic time warping comb filter for the enhancement of speech degraded by white Gaussian noise , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[18]  T. Baer,et al.  A pitch-synchronous analysis of hoarseness in running speech. , 1988, The Journal of the Acoustical Society of America.

[19]  Y. Qi,et al.  The estimation of signal-to-noise ratio in continuous speech for disordered voices. , 1999, The Journal of the Acoustical Society of America.

[20]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[21]  Luís B. Almeida,et al.  Speech separation by means of stationary least-squares harmonic estimation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[22]  S N Awan,et al.  Improvements in estimating the harmonics-to-noise ratio of the voice. , 1994, Journal of voice : official journal of the Voice Foundation.

[23]  Eric Moulines,et al.  HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[25]  Hanspeter Herzel,et al.  Bifurcations and Chaos in Voice Signals , 1993 .

[26]  Philip J. B. Jackson,et al.  Pitch‐synchronous decomposition of mixed‐source speech signals , 1998 .

[27]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[28]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[29]  Y. Horii Fundamental frequency perturbation observed in sustained phonation. , 1979, Journal of speech and hearing research.

[30]  J. S. Lim,et al.  Speech enhancement using the dual excitation speech model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  R.N. Bracewell,et al.  Signal analysis , 1978, Proceedings of the IEEE.

[32]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[33]  J. C. Anderson Complex signal reconstruction from time-frequency magnitude , 1994, Proceedings of IEEE 6th Digital Signal Processing Workshop.

[34]  M. Ng,et al.  Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. , 1998, The Journal of the Acoustical Society of America.

[35]  Elliot N. Pinson,et al.  Pitch‐Synchronous Time‐Domain Estimation of Formant Frequencies and Bandwidths , 1962 .

[36]  Christophe d'Alessandro,et al.  An iterative algorithm for decomposition of speech signals into periodic and aperiodic components , 1998, IEEE Trans. Speech Audio Process..

[37]  Meir Feder,et al.  Parameter Estimation and Extraction of Helicopter Signals Observed with a Wide-Band Interference , 1993, IEEE Trans. Signal Process..

[38]  Kenneth N. Stevens,et al.  Models for the production and acoustics of stop consonants , 1993, Speech Commun..

[39]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[40]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[41]  Christophe d'Alessandro,et al.  Time-frequency speech transformation based on an elementary waveform representation , 1990, Speech Commun..

[42]  Christophe d'Alessandro,et al.  Effectiveness of a periodic and aperiodic decomposition method for analysis of voice sources , 1998, IEEE Trans. Speech Audio Process..

[43]  Christophe d'Alessandro,et al.  Evaluation of a periodic/aperiodic speech decomposition algorithm , 1995, EUROSPEECH.

[44]  Christophe d'Alessandro,et al.  Modification of the Aperiodic Component of Speech Signals for Synthesis , 1997 .

[45]  D. G. Watts,et al.  Spectral analysis and its applications , 1968 .

[46]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[47]  P. Murphy,et al.  Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis. , 1999, The Journal of the Acoustical Society of America.

[48]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[49]  C H Shadle,et al.  Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. , 2000, The Journal of the Acoustical Society of America.

[50]  R. R. Boorstyn,et al.  Multiple tone parameter estimation from discrete-time observations , 1976, The Bell System Technical Journal.

[51]  Anthony J. Robinson,et al.  Enhancement and recognition of noisy speech within an autoregressive hidden Markov model framework using noise estimates from the noisy signal , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  Alan V. Oppenheim,et al.  Enhancement of speech by adaptive filtering , 1976, ICASSP.

[53]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[54]  S Sideman,et al.  Signal reconstruction from noisy-phase and -magnitude data. , 1994, Applied optics.