Interview: Large-scale Modeling of Media Dialog with Discourse Patterns and Knowledge Grounding

In this work, we perform the first large-scale analysis of discourse in media dialog and its impact on generative modeling of dialog turns, with a focus on interrogative patterns and use of external knowledge. Discourse analysis can help us understand modes of persuasion, entertainment, and information elicitation in such settings, but has been limited to manual review of small corpora. We introduce **Interview**—a large-scale (105K conversations) media dialog dataset collected from news interview transcripts—which allows us to investigate such patterns at scale. We present a dialog model that leverages external knowledge as well as dialog acts via auxiliary losses and demonstrate that our model quantitatively and qualitatively outperforms strong discourse-agnostic baselines for dialog modeling—generating more specific and topical responses in interview-style conversations.

[1]  Julia Hirschberg,et al.  The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts , 2000, AAAI/IAAI.

[2]  Qian Huang,et al.  Automated generation of news content hierarchy by integrating audio, video, and text information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  Shuyang Li,et al.  Speech Recognition and Multi-Speaker Diarization of Long Conversations , 2020, INTERSPEECH.

[5]  T. V. Dijk Discourse and communication : new approaches to the analysis of mass media discourse and communication , 1985 .

[6]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[7]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[8]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[9]  Jordan Cohen,et al.  The GALE project: A description and an update , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[10]  Jean-Luc Gauvain,et al.  Dynamic language modeling for broadcast news , 2004, INTERSPEECH.

[11]  R. Wodak Critical Discourse Analysis , 2003 .

[12]  Deb Roy,et al.  RadioTalk: a large-scale corpus of talk radio transcripts , 2019, INTERSPEECH.

[13]  Christopher Joseph Pal,et al.  Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.

[14]  Frédéric Béchet,et al.  The EPAC Corpus: Manual and Automatic Annotations of Conversational Speech in French Broadcast News , 2010, LREC.

[15]  Jiaqi Wu,et al.  Implicit Discourse Relation Identification for Open-domain Dialogues , 2019, ACL.

[16]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Laura Sheble,et al.  Misinformation and Mass Audiences , 2018 .

[19]  Richard M. Stern,et al.  The 1996 Hub-4 Sphinx-3 System , 1997 .

[20]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[21]  Ivana Kruijff-Korbayová,et al.  Multi-Task Learning of System Dialogue Act Selection for Supervised Pretraining of Goal-Oriented Dialogue Policies , 2019, SIGdial.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[24]  J. Sheyholislami,,et al.  Critical Discourse Analysis , 2019, Research Methods for Classroom Discourse.

[25]  Denny Britz,et al.  Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models , 2017, EMNLP.

[26]  Yang Liu,et al.  Initial Study on Automatic Identification of Speaker Role in Broadcast News Speech , 2006, NAACL.

[27]  Jason Weston,et al.  What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[28]  Chitta Baral,et al.  Exploring ways to incorporate additional knowledge to improve Natural Language Commonsense Question Answering , 2019, ArXiv.

[29]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[30]  W. Roelofsen,et al.  More questions than answers: a study of question–answer sequences in a naturalistic setting , 1982, Journal of Child Language.

[31]  Garrison W. Cottrell,et al.  ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.

[32]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[33]  Christian Raymond,et al.  Boosting bonsai trees for efficient features combination: application to speaker role identification , 2014, INTERSPEECH.

[34]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[35]  Jianfeng Gao,et al.  Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models , 2017, IJCNLP.

[36]  J. L. Nelson Misinformation and Mass Audiences , 2018, Journal of Broadcasting & Electronic Media.

[37]  Dilek Z. Hakkani-Tür,et al.  Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations , 2019, INTERSPEECH.

[38]  Kristin Precoda,et al.  Automatic identification of speaker role and agreement/disagreement in broadcast conversation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Pentti Haddington,et al.  Stance Taking in News Interviews , 2004 .

[40]  Liyuan Liu,et al.  On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[41]  Jianqing Wu,et al.  Evaluation in Media Discourse Analysis of a Newspaper Corpus , 2010, J. Quant. Linguistics.

[42]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[43]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[44]  Stephanie Strassel Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text , 2004, LREC.

[45]  Claire Gardent,et al.  Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs , 2019, EMNLP.

[46]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[47]  Discourse Representation in Media Discourse , 1988 .

[48]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[49]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[50]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[51]  Elda Weizman Positioning in Media Dialogue: Negotiating roles in the news interview , 2008 .

[52]  Robin Cooper,et al.  The syntax and semantics of when-questions , 1982 .

[53]  Mari Ostendorf,et al.  Unsupervised broadcast conversation speaker role labeling , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Zhe Gan,et al.  Distilling the Knowledge of BERT for Text Generation , 2019, ArXiv.

[55]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.