An Evaluation of Chinese Human-Computer Dialogue Technology

The human-computer dialogue has recently attracted extensive attention from both academia and industry as an important branch in the field of artificial intelligence (AI). However, there are few studies on the evaluation of large-scale Chinese human-computer dialogue systems. In this paper, we introduce the Second Evaluation of Chinese Human-Computer Dialogue Technology, which focuses on the identification of a user's intents and intelligent processing of intent words. The Evaluation consists of user intent classification (Task 1) and online testing of task-oriented dialogues (Task 2), the data sets of which are provided by iFLYTEK Corporation. The evaluation tasks and data sets are introduced in detail, and meanwhile, the evaluation results and the existing problems in the evaluation are discussed.

[1]  Wanxiang Che,et al.  The First Evaluation of Chinese Human-Computer Dialogue Technology , 2017, ArXiv.

[2]  Milica Gasic,et al.  Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk , 2011, INTERSPEECH.

[3]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[4]  Michalis Vazirgiannis,et al.  Text Categorization as a Graph Classification Problem , 2015, ACL.

[5]  Dilek Z. Hakkani-Tür,et al.  Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems , 2018, NAACL.

[6]  Dongyan Zhao,et al.  Coupled Context Modeling for Deep Chit-Chat: Towards Conversations between Human and Computer , 2018, KDD.

[7]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[8]  Maxine Eskénazi,et al.  Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results , 2011, SIGDIAL Conference.

[9]  Alexander I. Rudnicky,et al.  User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario , 2017, ArXiv.

[10]  Bo Tang,et al.  Toward Optimal Feature Selection in Naive Bayes for Text Categorization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Zhenchang Xing,et al.  Ensemble application of convolutional and recurrent neural networks for multi-label text categorization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[12]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Caixia Yuan,et al.  Recent Advances on Human-Computer Dialogue , 2016, CAAI Trans. Intell. Technol..

[14]  Ming Zhou,et al.  SuperAgent: A Customer Service Chatbot for E-commerce Websites , 2017, ACL.

[15]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[16]  Maxine Eskénazi,et al.  Spoken Dialog Challenge 2010 , 2010, 2010 IEEE Spoken Language Technology Workshop.

[17]  Takaaki Hori,et al.  End-to-end Conversation Modeling Track in DSTC6 , 2017, ArXiv.

[18]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.