论文信息 - A segment based probabilistic generative model of speech

A segment based probabilistic generative model of speech

We present a purely time domain approach to speech processing which identifies waveform samples at the boundaries between glottal pulse periods (in voiced speech) or at the boundaries of unvoiced segments. An efficient algorithm for inferring these boundaries and estimating the average spectra of voiced and unvoiced regions is derived from a simple probabilistic generative model. Competitive results are presented on pitch tracking, voiced/unvoiced detection and timescale modification; all these tasks and several others can be performed using the single segmentation provided by inference in the model.

[1] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2] Fabrice Plante,et al. A pitch extraction reference database , 1995, EUROSPEECH.

[3] Li Deng,et al. Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[4] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[5] Yann LeCun,et al. Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch , 2002, NIPS.

[6] Lawrence K. Saul,et al. Multiband statistical learning for f/sub 0/ estimation in speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Brendan J. Frey,et al. A Segmental HMM for Speech Waveforms , 2004 .

[8] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .