Automatic alignment of speech with phonetic transcriptions in real time

A system to align speech waveforms with the corresponding phonetic transcriptions is described. The alignment is mainly based on the labeling of speech frames centisecond apart to phonetic classes. A novel method based on neural network principles is used to accomplish the labeling. Another major source of information utilized is spectral stationarity. The alignment is performed in two main stages. First, a list of phonetic events having stationary properties is constructed. The phonetic transcription is roughly aligned with this list. A more detailed boundary refinement is then carried out using heuristic speech-specific knowledge. The system is running on standard IBM PC/AT in real time. It is used for on-line speaker enrollment and syntactic correction analysis in addition to establishing a database for speech recognition research.<<ETX>>