论文信息 - Active Learning for Domain Classification in a Commercial Spoken Personal Assistant

Active Learning for Domain Classification in a Commercial Spoken Personal Assistant

We describe a method for selecting relevant new training data for the LSTM-based domain selection component of our personal assistant system. Adding more annotated training data for any ML system typically improves accuracy, but only if it provides examples not already adequately covered in the existing data. However, obtaining, selecting, and labeling relevant data is expensive. This work presents a simple technique that automatically identifies new helpful examples suitable for human annotation. Our experimental results show that the proposed method, compared with random-selection and entropy-based methods, leads to higher accuracy improvements given a fixed annotation budget. Although developed and tested in the setting of a commercial intelligent assistant, the technique is of wider applicability.

[1] Yuan Li,et al. Learning how to Active Learn: A Deep Reinforcement Learning Approach , 2017, EMNLP.

[2] Peter Stone,et al. Learning a Policy for Opportunistic Active Learning , 2018, EMNLP.

[3] Spyridon Matsoukas,et al. Active Learning for New Domains in Natural Language Understanding , 2018, NAACL.

[4] Anima Anandkumar,et al. Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[5] David Vandyke,et al. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[6] Pascal Poupart,et al. Deep Active Learning for Dialogue Generation , 2016, *SEMEVAL.

[7] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[8] Tomoki Toda,et al. Active Learning for Example-Based Dialog Systems , 2016, IWSDS.

[9] Dominique Estival,et al. Active learning for deep semantic parsing , 2018, ACL.