Time-domain isolated phoneme classification using reconstructed phase spaces

This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a composite classifier using both cepstral and phase space features is developed. Results indicate that although the accuracy of the phase space approach by itself is still currently below that of baseline cepstral methods, a combined approach is capable of increasing speaker independent phoneme accuracy.

[1]  Richard J. Povinelli,et al.  Identification of ECG Arrhythmias Using Phase Space Reconstruction , 2001, PKDD.

[2]  Gregory W. Wornell,et al.  Effects of convolution on chaotic signals , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  James B. Kadtke,et al.  CLASSIFICATION OF HIGHLY NOISY SIGNALS USING GLOBAL DYNAMICAL MODELS , 1995 .

[4]  J. A. Stewart,et al.  Nonlinear Time Series Analysis , 2015 .

[5]  D. Barone,et al.  Speaker identification using nonlinear dynamical features , 2002 .

[6]  Richard J. Povinelli,et al.  Study of attractor variation in the reconstructed phase space of speech signals , 2003, NOLISP.

[7]  Li Deng,et al.  A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics , 1999, EUROSPEECH.

[8]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[9]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[10]  Steve McLaughlin,et al.  IEE Colloquium on "Exploiting Chaos in Signal Processing, Digest No 1994/193 , 1994 .

[11]  L Deng,et al.  Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics. , 2000, The Journal of the Acoustical Society of America.

[12]  F. Takens Detecting strange attractors in turbulence , 1981 .

[13]  Yifan Gong,et al.  Stochastic trajectory modeling for speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Richard J. Povinelli,et al.  Speech recognition using reconstructed phase space features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Y. Wong,et al.  Differentiable Manifolds , 2009 .

[16]  Richard J. Povinelli,et al.  Phoneme classification over the reconstructed phase space using principal component analysis , 2003, NOLISP.

[17]  Richard J. Povinelli,et al.  Phoneme classification using naive Bayes classifier in reconstructed phase space , 2002, Proceedings of 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop..

[18]  H. Abarbanel,et al.  Determining embedding dimension for phase-space reconstruction using a geometrical construction. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[19]  Richard J. Povinelli,et al.  Are nonlinear ventricular arrhythmia characteristics lost, as signal duration decreases? , 2002, Computers in Cardiology.

[20]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[21]  Richard J. Povinelli,et al.  Vowel classification by global dynamic modeling , 2003, NOLISP.

[22]  Mehmet Emre Çek,et al.  Analysis of observed chaotic data , 2004 .

[23]  S. K. Mullick,et al.  NONLINEAR DYNAMICAL ANALYSIS OF SPEECH , 1996 .

[24]  M. Casdagli Chaos and Deterministic Versus Stochastic Non‐Linear Modelling , 1992 .

[25]  Richard J. Povinelli,et al.  A combined sub-band and reconstructed phase space approach to phoneme classification , 2003, NOLISP.

[26]  Hamid Sheikhzadeh,et al.  Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization , 1994, IEEE Trans. Speech Audio Process..

[27]  Richard J. Povinelli,et al.  Detecting determinism in speech phonemes , 2002, Proceedings of 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop..

[28]  A. Alwan,et al.  A nonlinear dynamical systems analysis of fricative consonants. , 1995, The Journal of the Acoustical Society of America.

[29]  N. A. O. Demerdash,et al.  Diagnostics of Bar and End-Ring Connector Breakage Faults in Polyphase Induction Motors through a Novel Dual Track of Time-Series Data Mining and Time-Stepping Coupled FE-State Space Modeling , 2002, IEEE Power Engineering Review.

[30]  Richard J. Povinelli,et al.  Performance of nonlinear speech enhancement using phase space reconstruction , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[31]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[32]  Richard J. Povinelli,et al.  Speech Recognition Using Features Extracted from Phase Space Reconstructions , 2003 .

[33]  Nust Naming,et al.  Application of the chaos, fractal and wavelet theories to the feature extraction of passive acoustic signal , 1999 .

[34]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[35]  Broggi,et al.  Dimension increase in filtered chaotic signals. , 1988, Physical review letters.

[36]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..