Information Retrieval in Restricted Domain for ChatterBots

This paper presents a restricted domain approach for handling information in the context of regulation search. Information Retrieval (IR) using a chatterbot as front-end is especially complex due to the interaction with Natural Language. The problem becomes more complicated when the IR is for a Restricted Domain: there is no statistical error compensation. The slang in the documentation usually does not match the common language usage, and dialogs tend to be informal, making people tend to perform incomplete questions to the system. PTAH is a chatterbot developed to interact with students, administrative employees, professors and many other people of the university community. Cultural diversity and big age differences present an interesting challenge for the Natural Language Processing area, since many approaches tend to overcome only one of the previous aspects. This paper presents the chatterbot PTAH, and specifically depicts an IR approach that intends to improve the quality of the answers. Results are evaluated with traditional precision and recall metrics, and an additional one to assess the quality of the heuristics involved in the retrieval process. Statistics indicate that it is possible to apply a specific combination of simple and traditional heuristics with good results.

[1]  Antonio Ferrández Rodríguez,et al.  An Approach for Adding Noise-Tolerance to Restricted-Domain Information Retrieval , 2010, NLDB.

[2]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[4]  Michael L. Mauldin,et al.  CHATTERBOTS, TINYMUDS, and the Turing Test: Entering the Loebner Prize Competition , 1994, AAAI.

[5]  Asunción Gómez-Pérez,et al.  A platform for the development of semantic web portals , 2006, ICWE '06.

[6]  Kenneth Mark Colby,et al.  Turing-like Indistinguishability Tests for the Calidation of a Computer Simulation of Paranoid Processes , 1972, Artif. Intell..

[7]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[8]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[9]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[10]  Patrick Paroubek,et al.  XTAG - A Graphical Workbench for Developing Tree-Adjoining Grammars , 1992, ANLP.

[11]  Antonio Pareja Lora,et al.  Primeras aproximaciones a la anotación lingüístico-ontológica de documentos de la Web Semántica: OntoTag , 2003 .

[12]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[13]  Geoffrey I. Webb Decision Tree Grafting From the All Tests But One Partition , 1999, IJCAI.

[14]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[15]  Asunción Gómez-Pérez,et al.  WebODE: An Integrated Workbench for Ontology Representation, Reasoning, and Exchange , 2002, EKAW.

[16]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[17]  Antonio Pareja-Lora,et al.  RDF(S)/XML Linguistic Annotation of Semantic Web Pages , 2002, NLPXML@COLING.

[18]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .