论文信息 - ConVEx: Data-Efficient and Few-Shot Slot Labeling - 字舞流文

ConVEx: Data-Efficient and Few-Shot Slot Labeling

We propose ConVEx (Conversational Value Extractor), an efficient pretraining and fine-tuning neural approach for slot-labeling dialog tasks. Instead of relying on more general pretraining objectives from prior work (e.g., language modeling, response selection), ConVEx’s pretraining objective, a novel pairwise cloze task using Reddit data, is well aligned with its intended usage on sequence labeling tasks. This enables learning domain-specific slot labelers by simply fine-tuning decoding layers of the pretrained general-purpose sequence labeling model, while the majority of the pretrained model’s parameters are kept frozen. We report state-of-the-art performance of ConVEx across a range of diverse domains and data sets for dialog slot-labeling, with the largest gains in the most challenging, few-shot setups. We believe that ConVEx’s reduced pretraining times (i.e., only 18 hours on 12 GPUs) and cost, along with its efficient fine-tuning and strong performance, promise wider portability and scalability for data-efficient sequence-labeling tasks in general.

Matthew Henderson | Ivan Vuli'c

[1] Zhihan Zhou,et al. Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network , 2020, ACL.

[2] Steve J. Young,et al. Still talking to machines (cognitively speaking) , 2010, INTERSPEECH.

[3] Armen Aghajanyan,et al. Pre-training via Paraphrasing , 2020, NeurIPS.

[4] Stefan Ultes,et al. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[5] Matthew Henderson,et al. Efficient Natural Language Response Suggestion for Smart Reply , 2017, ArXiv.

[6] Thomas Wolf,et al. Transfer Learning in Natural Language Processing , 2019, NAACL.

[7] Vladimir Vlasov,et al. DIET: Lightweight Language Understanding for Dialogue Systems , 2020, ArXiv.

[8] Maxine Eskénazi,et al. Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models , 2019, NAACL.

[9] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[10] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[11] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[12] Raghav Gupta,et al. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[13] Steve J. Young,et al. Talking to machines (statistically speaking) , 2002, INTERSPEECH.

[14] Matthew Henderson,et al. Training Neural Response Selection for Task-Oriented Dialogue Systems , 2019, ACL.

[15] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[16] Matthew Henderson,et al. A Repository of Conversational Datasets , 2019, Proceedings of the First Workshop on NLP for Conversational AI.

[17] Matthew Henderson,et al. ConveRT: Efficient and Accurate Conversational Representations from Transformers , 2020, EMNLP.

[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[20] Omer Levy,et al. Few-Shot Question Answering by Pretraining Span Selection , 2021, ACL/IJCNLP.

[21] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[22] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[23] Nan Hua,et al. Universal Sentence Encoder for English , 2018, EMNLP.

[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[25] Dilek Z. Hakkani-Tür,et al. DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue , 2020, ArXiv.

[26] Javier Snaider,et al. Conversational Contextual Cues: The Case of Personalization and History for Response Ranking , 2016, ArXiv.

[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[28] Matthew Henderson,et al. Efficient Intent Detection with Dual Sentence Encoders , 2020, NLP4CONVAI.

[29] Jason Weston,et al. Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019 .

[30] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32] Matthew Henderson,et al. Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations , 2020, ACL.

[33] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .

[34] Matthew Henderson,et al. The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[35] Sung Whan Yoon,et al. TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning , 2019, ICML.

[36] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[37] Varvara Logacheva,et al. Few-shot classification in named entity recognition task , 2018, SAC.

[38] Francesco Caltagirone,et al. Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[39] Yi Zhang,et al. Learning to Classify Intents and Slot Labels Given a Handful of Examples , 2020, NLP4CONVAI.

[40] Andrew McCallum,et al. An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[41] Gokhan Tur,et al. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[42] Jason D. Williams,et al. Web-style ranking and SLU combination for dialog state tracking , 2014, SIGDIAL Conference.

[43] Oren Etzioni,et al. Green AI , 2019, Commun. ACM.