Semi-supervised Learning for Information Extraction from Dialogue

In this work we present a method for semi-supervised learning from transcripts of dialogue between humans. We consider the scenario in which a large amount of transcripts are available, and we would like to extract some semantic information from them; however, only a small number of transcripts have been labeled with this information. We present a method for leveraging the unlabeled data to learn a better model than could be learned from the labeled data alone. First, a recurrent neural network (RNN) encoder-decoder is trained on the task of predicting nearby turns on the full dialogue corpus; next, the RNN encoder is reused as a feature representation for the supervised learning problem. While previous work has explored the use of pre-training for non-dialogue corpora, our method is specifically geared toward the dialogue use case. We demonstrate an improvement on a clinical documentation task, particularly in the regime of small amounts of labeled data. We compare several types of encoders, both in the context of a classification task and in a human-evaluation of their learned representations. We show that our method significantly improves the classification task in the case where only a small amount of labeled data is available.

[1]  Evgeny A. Stepanov,et al.  Towards End-to-End Spoken Dialogue Systems with Turn Embeddings , 2017, INTERSPEECH.

[2]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[3]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[4]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[7]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[8]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[9]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  N. Shah,et al.  What This Computer Needs Is a Physician: Humanism and Artificial Intelligence. , 2018, Journal of the American Medical Association (JAMA).

[12]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[14]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[15]  Joelle Pineau,et al.  Hierarchical Neural Network Generative Models for Movie Dialogues , 2015, ArXiv.

[16]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[17]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[18]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[19]  Navdeep Jaitly,et al.  Speech recognition for medical conversations , 2017, INTERSPEECH.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[24]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[25]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.