Paraphrase Label Alignment for Voice Application Retrieval in Spoken Language Understanding

Spoken language understanding (SLU) smart assistants such as Amazon Alexa host hundreds of thousands of voice applications (skills) to delight end-users and fulfill their utterance requests. Sometimes utterances fail to be claimed by smart assistants due to system problems such as model incapability or routing errors. The failure may lead to customer frustration, dialog termination and eventually cause customer churn. To avoid this, we design a skill retrieval system as a downstream service to suggest fallback skills to unclaimed utterances. If the suggested skill satisfies customer intent, the conversation will be recovered with the assistant. For the sake of smooth customer experience, we only present the most relevant skill to customers, resulting in partial observation problem which constrains retrieval model training. To solve this problem, we propose a two-step approach to automatically align claimed utterance labels to unclaimed utterances. Extensive experiments on two real-world datasets demonstrate that our proposed model significantly outperforms a number of strong alternatives.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Xiaozhong Liu,et al.  Detecting User Community in Sparse Domain via Cross-Graph Pairwise Learning , 2020, SIGIR.

[3]  Michael Gallant,et al.  Mapping the co-evolution of artificial intelligence, robotics, and the internet of things over 20 years (1998-2017) , 2020, PloS one.

[4]  Roi Reichart,et al.  Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance , 2018, EMNLP.

[5]  Ya Li,et al.  Speech Emotion Recognition via Contrastive Loss under Siamese Networks , 2018, Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data.

[6]  Rongrong Ji,et al.  Asymmetric Co-Teaching for Unsupervised Cross Domain Person Re-Identification , 2019, AAAI.

[7]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[8]  Yiming Yang,et al.  Cross-lingual Distillation for Text Classification , 2017, ACL.

[9]  Zhi-Hua Zhou,et al.  Efficient Training for Positive Unlabeled Learning , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Zheng Gao,et al.  Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding , 2020, WWW.

[11]  W. Marsden I and J , 2012 .

[12]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[13]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[14]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[15]  Ahmed Hassan Awadallah,et al.  Adaptive Self-training for Few-shot Neural Sequence Labeling , 2020, ArXiv.

[16]  Luis Gravano,et al.  Cross-Lingual Text Classification with Minimal Resources by Transferring a Sparse Teacher , 2020, FINDINGS.

[17]  Imre Kiss,et al.  Towards an ASR Error Robust Spoken Language Understanding System , 2020, INTERSPEECH.

[18]  Gang Niu,et al.  Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data , 2016, ICML.

[19]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Few-shot Text Classification , 2020, NeurIPS.

[20]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[21]  Neil Genzlinger A. and Q , 2006 .

[22]  Yan Lin,et al.  A Two-stage Iterative Approach to Improve Crowdsourcing-Based Relevance Assessment , 2018, Arabian Journal for Science and Engineering.

[23]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[24]  Thomas Roelleke,et al.  TF-IDF uncovered: a study of theories and probabilities , 2008, SIGIR '08.

[25]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[26]  Chen Chen,et al.  Adversarial Self-Supervised Data Free Distillation for Text Classification , 2020, EMNLP.

[27]  Jesse Davis,et al.  Learning from positive and unlabeled data: a survey , 2018, Machine Learning.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[30]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[31]  Young-Bum Kim,et al.  Pseudo Labeling and Negative Feedback Learning for Large-Scale Multi-Label Domain Classification , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).