论文信息 - More Diverse Dialogue Datasets via Diversity-Informed Data Collection - 字舞流文

More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection. Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. Diversity-Informed Data Collection produces significantly more diverse data than baseline data collection methods, and better results on two downstream tasks: emotion classification and dialogue generation. This method is generalizable and can be used with other corpus-level metrics.

Grace Hui Yang | Marti A. Hearst | Katherine Stasaski | G. Yang | Katherine Stasaski

[1] Alice M. Brawley,et al. Work experiences on MTurk: Job satisfaction, turnover, and information sharing , 2016, Comput. Hum. Behav..

[2] Nan Hua,et al. Universal Sentence Encoder for English , 2018, EMNLP.

[3] Beng Chin Ooi,et al. iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[4] Sepehr Assadi,et al. Online Assignment of Heterogeneous Tasks in Crowdsourcing Markets , 2015, HCOMP.

[5] Alan Ritter,et al. Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints , 2018, EMNLP.

[6] Daniel Jurafsky,et al. A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[7] Sihem Amer-Yahia,et al. Task Assignment Optimization in Collaborative Crowdsourcing , 2015, 2015 IEEE International Conference on Data Mining.

[8] Dongyan Zhao,et al. Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism , 2018, IJCAI.

[9] Chun-Ju Yang,et al. Visual Question Answer Diversity , 2018, HCOMP.

[10] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[11] Michael S. Bernstein,et al. In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures , 2018, CHI.

[12] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[13] Lingjia Tang,et al. Outlier Detection for Improved Data Quality and Diversity in Dialog Systems , 2019, NAACL.

[14] Hiroyuki Kitagawa,et al. Skill-and-Stress-Aware Assignment of Crowd-Worker Groups to Task Streams , 2018, HCOMP.

[15] Denny Britz,et al. Generating Long and Diverse Responses with Neural Conversation Models , 2017, ArXiv.

[16] Stephen Clark,et al. Latent Variable Dialogue Models and their Diversity , 2017, EACL.

[17] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[18] Tong Liu,et al. Learning to Predict Population-Level Label Distributions , 2019, WWW.

[19] Lingjia Tang,et al. Data Collection for Dialogue System: A Startup Perspective , 2018, NAACL-HLT.

[20] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21] Sihem Amer-Yahia,et al. Task assignment optimization in knowledge-intensive crowdsourcing , 2015, The VLDB Journal.

[22] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[23] Benjamin B. Bederson,et al. Web workers unite! addressing challenges of online laborers , 2011, CHI Extended Abstracts.

[24] Y-Lan Boureau,et al. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[25] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[26] Ari Kobren,et al. Getting More for Less: Optimized Crowdsourcing with Dynamic Tasks and Goals , 2015, WWW.

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Zhe Gan,et al. Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization , 2018, NeurIPS.

[29] Mausam,et al. Active Learning with Unbalanced Classes and Example-Generation Queries , 2018, HCOMP.

[30] Maxine Eskénazi,et al. Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.