Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities

Popular conversational agents frameworks such as Alexa Skills Kit (ASK) and Google Actions (gActions) offer unprecedented opportunities for facilitating the development and deployment of voice-enabled AI solutions in various verticals. Nevertheless, understanding user utterances with high accuracy remains a challenging task with these frameworks. Particularly, when building chatbots with large volume of domain-specific entities. In this paper, we describe the challenges and lessons learned from building a large scale virtual assistant for understanding and responding to equipment-related complaints. In the process, we describe an alternative scalable framework for: 1) extracting the knowledge about equipment components and their associated problem entities from short texts, and 2) learning to identify such entities in user utterances. We show through evaluation on a real dataset that the proposed framework, compared to off-the-shelf popular ones, scales better with large volume of entities being up to 30% more accurate, and is more effective in understanding user utterances with domain-specific entities.

[1]  Jay Pujara,et al.  Mining Knowledge Graphs From Text , 2018, WSDM.

[2]  Serjik G. Dikaleh,et al.  Refine, restructure and make sense of data visually, using IBM Watson Studio , 2018, CASCON.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Christopher Ré,et al.  Fonduer: Knowledge Base Construction from Richly Formatted Data , 2017, SIGMOD Conference.

[6]  Daniel Whyatt,et al.  A Novel Approach to Part Name Discovery in Noisy Text , 2018, NAACL-HLT.

[7]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[8]  A. Chandramouli,et al.  Unsupervised Extraction of Part Names from Service Logs , 2022 .

[9]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[10]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[11]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[12]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[13]  Luis A. Guerrero,et al.  Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces , 2017 .

[14]  Catherine Havasi,et al.  ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.

[15]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[16]  Hua Wu,et al.  An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge , 2017, ACL.

[17]  Kasam Shaikh Creating the FAQ Bot Backend from Scratch , 2019 .

[18]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[19]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[20]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[21]  Hongxia Jin,et al.  Beyond word embeddings: learning entity and concept representations from large scale knowledge bases , 2018, Information Retrieval Journal.

[22]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[23]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[26]  Nicholas Jing Yuan,et al.  Collaborative Knowledge Base Embedding for Recommender Systems , 2016, KDD.

[27]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[28]  Björn Hoffmeister,et al.  Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding , 2017, ArXiv.