Semantic-based Hidden Markov / Context Free Grammar Language Modeling

This thesis presents a hybrid language model, both statisticand rule-based. having the structure of a Hidden Markov Model with some nodes modeled with n-grams and others with context free rules. An EM-algorithm is used to train the model parameters from a manually annotated corpus of sentences. The performance is evaluated from the results of the decoding \\,rith a speech recognition engine integrated with the language model and compared to a baseline 3gram model. The designed model shows a similar word error rate than the baseline. but outperforms it in understanding accuracy by 15.3%.

[1]  H. Soltau,et al.  Efficient handling of multilingual language models , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Alex Acero,et al.  Concept acquisition in example-based grammar authoring , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Tanja Schultz,et al.  Towards universal speech recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[4]  Helen Meng,et al.  Improvements on a semi-automatic grammar induction framework , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[5]  Simon King,et al.  IEEE Workshop on automatic speech recognition and understanding , 2009 .