A new pitch synchronous time domain phoneme recognizer using component analysis and pitch clustering

A new framework for time domain voiced phoneme recognition is shown. Each speech frame taken for training and recognition is bounded by consecutive glottal closures. A preprocessing stage is designed and implemented to model pitch synchronous frames with gaussian mixture models. Component analysis carried out on the data shows optimal performance with a very small number of components, requiring low computational power. We designed a new clustering technique that, using the pitch period, gives better results than other well known clustering algorithms like k-means.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  Sora Kim,et al.  Time delay estimation and adaptive frame length iterations for noise robust pitch extraction , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Ramon Prieto,et al.  A general solution to the maximization of the multidimensional generalized Rayleigh quotient used in linear discriminant analysis for signal classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  M. Hunt,et al.  Speech recognition using an auditory model with pitch-synchronous analysis , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  P. Badin,et al.  Vocal tract simulation: Implementation of continuous variations of the length in a Kelly-Lochbaum model, effects of area function spatial sampling , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[7]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .