Nonlinear modeling and processing of speech with applications to speech coding

111 recent years there has been increasing interest in nonlinear speech modeling. In our approach, a speech signal is modeled as a sum of jointly amplitude ( A M ) and frequency (FM) modula.ted cosines with slowly-varying ce~lt~er frecluencies. The key problem is to extra.ct the center frequency ancl the a.inplitucle and frecluency modu.lations for each forma,nt in t,he nlodel from the inea,sured speech signa,ls. In this study, we describe the speech signal in terms of stcatistical inoclels and apply statcistical nonlinear filtering techniclues (Extended I utationally t.ract.sble manner. Using Cra,mer-R.ao 11ound techniques, we ca.n compa.re t'lle performanc(of our computationally feasible estima.tors relative to the perfornlance of the coniput,a.tionally intra.cta,ble optimal estima.tor. Recoml~ination of the amplitude aad frequency signals g;enerat.ed by our approach results in fa,it'hful recollstruction of speech in both the t i me a.nd frequency c1oma.ins. We consider two applications. The first a.pl~lication, ~: l l ich is forma.11t tra,cl;ing, is a direct application of our non1inea.r filters since the fonna.nt frecluencies are a pa.rt. of our nonlinear model. The a.pplica,t'ion of our entire frame\vorl; to speech coding is also discussed.

[1]  Jorge I. Galdos,et al.  A lower bound on filtering error with application to phase demodulation , 1979, IEEE Trans. Inf. Theory.

[2]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Shan Lu,et al.  Demodulators for AM-FM models of speech signals: a comparison , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Panos E. Papamichalis,et al.  Practical approaches to speech coding , 1987 .

[5]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[6]  Manfred R. Schroeder,et al.  Vocoders: Analysis and synthesis of speech , 1966 .

[7]  Paul Bratley,et al.  A guide to simulation , 1983 .

[8]  M. Zakai,et al.  A lower bound on the estimation error for Markov processes , 1975 .

[9]  H. Teager Some observations on oral air flow during phonation , 1980 .

[10]  Robert B. Dunn,et al.  Detection of transient signals using the energy operator , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12]  Harold W. Sorenson,et al.  Parameter estimation: Principles and problems , 1980 .

[13]  J. Galdos A cramér-rao bound for multidimensional discrete-time dynamical systems , 1980 .

[14]  Balth. van der Pol,et al.  The Fundamental Principles of Frequency Modulation , 1946 .

[15]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[16]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[17]  Linus Schrage,et al.  A guide to simulation , 1983 .

[18]  Ira Alan Gerson,et al.  Vector Sum Excited Linear Prediction (VSELP) , 1991 .

[19]  L. Scharf,et al.  Statistical Signal Processing: Detection, Estimation, and Time Series Analysis , 1991 .

[20]  Yen-Chun Lin,et al.  A Low-Delay CELP Coder for the CCITT 16 kb/s Speech Coding Standard , 1992, IEEE J. Sel. Areas Commun..

[21]  T. Duncan PROBABILITY DENSITIES FOR DIFFUSION PROCESSES WITH APPLICATIONS TO NONLINEAR FILTERING THEORY AND DETECTION THEORY , 1967 .

[22]  Thomas F. Quatieri,et al.  Magnitude-only reconstruction using a sinusoidal speech modelMagnitude-only reconstruction using a sinusoidal speech model , 1984, ICASSP.

[23]  D. Esteban,et al.  Application of quadrature mirror filters to split band voice coding schemes , 1977 .

[24]  P. Doerschuk Cramer-Rao bounds for discrete-time nonlinear filtering problems , 1995, IEEE Trans. Autom. Control..

[25]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[26]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[27]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[28]  Petros Maragos,et al.  Finding speech formants and modulations via energy separation: with application to a vocoder , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Harvey F. Silverman,et al.  Stop classification using DESA-1 high resolution formant tracking , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Shan Lu,et al.  Modeling and processing speech with sums of AM-FM formant models , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[31]  Shan Lu,et al.  Nonlinear modeling and processing of speech based on sums of AM-FM formant models , 1996, IEEE Trans. Signal Process..

[32]  Yair Shoham,et al.  New directions in subband coding , 1988, IEEE J. Sel. Areas Commun..

[33]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[34]  Harvey F. Silverman,et al.  Time-varying feature selection and classification of unvoiced stop consonants , 1994, IEEE Trans. Speech Audio Process..