A Segmental HMM for Speech Waveforms

We present a purely time domain approach to speech processing which identies waveform samples at the boundaries between glottal pulse periods (in voiced speech) or at the boundaries between unvoiced segments. An ecien t algorithm for inferring these boundaries is derived from a simple probabilistic generative model of speech and state of the art results are presented on pitch tracking, voiced/unvoiced detection and timescale modication.

[1]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[3]  Radford M. Neal,et al.  Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models , 2003, NIPS.

[4]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..