论文信息 - Automatic alignment of speech with phonetic transcriptions in real time

Automatic alignment of speech with phonetic transcriptions in real time

A system to align speech waveforms with the corresponding phonetic transcriptions is described. The alignment is mainly based on the labeling of speech frames centisecond apart to phonetic classes. A novel method based on neural network principles is used to accomplish the labeling. Another major source of information utilized is spectral stationarity. The alignment is performed in two main stages. First, a list of phonetic events having stationary properties is constructed. The phonetic transcription is roughly aligned with this list. A more detailed boundary refinement is then carried out using heuristic speech-specific knowledge. The system is running on standard IBM PC/AT in real time. It is used for on-line speaker enrollment and syntactic correction analysis in addition to establishing a database for speech recognition research.<<ETX>>

Kari Torkkola | K. Torkkola

[1] Hong C. Leung,et al. Automatic alignment of phonetic transcriptions with continuous speech , 1984 .

[2] Teuvo Kohonen,et al. Self-Organization and Associative Memory , 1988 .

[3] Michael Wagner. Automatic labelling of continuous speech with a given phonetic transcription using dynamic programming algorithms , 1981, ICASSP.

[4] Olli Ventä,et al. Phonetic typewriter for Finnish and Japanese , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[5] Bert Van Coile. Computer-aided segmentation of spoken words given their orthographic representation , 1987, ECST.

[6] Kari Torkkola,et al. A microprocessor-based word recognition system for large vocabularies , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] S. Haltsonen,et al. Collection of phoneme samples using time alignment and spectral stationarity of speech signals , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Olli Ventä,et al. Microprocessor implementation of a large vocabulary speech recognizer and phonetic typewriter for Finnish and Japanese , 1987, ECST.

[9] Tamotsu Kasai,et al. A Method for the Correction of Garbled Words Based on the Levenshtein Metric , 1976, IEEE Transactions on Computers.