Backpropagation training for multilayer conditional random field based phone recognition

Conditional random fields (CRFs) have recently found increased popularity in automatic speech recognition (ASR) applications. CRFs have previously been shown to be effective combiners of posterior estimates from multilayer perceptrons (MLPs) in phone and word recognition tasks. In this paper, we describe a novel hybrid Multilayer-CRF structure (ML-CRF), where a MLP-like hidden layer serves as input to the CRF; moreover, we propose a technique for directly training the ML-CRF to optimize a conditional log-likelihood based criterion, based on error backpropagation. The proposed technique thus allows for the implicit learning of suitable feature functions for the CRF. We present results for initial phone recognition experiments on the TIMIT database that indicate that our proposed method is a promising approach for training CRFs.

[1]  Frantisek Grézl,et al.  Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[3]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Brian Kingsbury,et al.  Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[7]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .

[9]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[10]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[11]  Eric Fosler-Lussier,et al.  Conditional Random Fields for Integrating Local Discriminative Classifiers , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Steve Renals,et al.  Speech Recognition Using Augmented Conditional Random Fields , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Chris Brew,et al.  Discriminative Input Stream Combination for Conditional Random Field Phone Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.