Deep belief network based CRF for spoken language understanding

The key task in spoken language understanding research is the semantic tagging of sequences. Deep belief networks have recently shown great performance in word-labeling tasks while conditional random field has been a successful approach to model probabilities of sequences in a global fashion. In contrast to CRFs, DBNs are optimized based on a tag-by-tag likelihood in a locally normalized way and may suffer from the label bias problem. In this paper, we combine the DBN and CRF by employing the CRF model on top hidden layer of the DBN. This DBN-CRF architecture can explicitly model the dependencies of the output labels with transition features, and can be trained with a global sequence-level objective function. Experiments on ATIS corpus show that the new model outperforms CRFs and DBNs by 4.9% and 3.8% respectively. After effectively pre-training with additional unlabeled data, the results can be state-of-the-art, compared to the recent RNN-CRF model.

[1]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[2]  Chin-Hui Lee,et al.  A speech understanding system based on statistical representation of semantics , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[6]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[7]  Li Deng,et al.  Learning in the Deep-Structured Conditional Random Fields , 2009 .

[8]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[11]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[12]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[13]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .