Detecting Missing Content Queries in an SMS-Based HIV/AIDS FAQ Retrieval System

Automated Frequently Asked Question FAQ answering systems use pre-stored sets of question-answer pairs as an information source to answer natural language questions posed by the users. The main problem with this kind of information source is that there is no guaranteei¾?that there will be a relevant question-answer pair for all user queries. In this paper, we propose to deploy a binary classifier in an existing SMS-Based HIV/AIDS FAQ retrieval system to detect user queries that do not have the relevant question-answer pair in the FAQ document collection. Before deploying such a classifier, we first evaluate different feature sets for training in order to determine the sets of features that can build a model that yields the best classification accuracy. We carry out our evaluation using seven different feature sets generated from a query log before and after retrieval by the FAQ retrieval system. Our results suggest that, combining different feature sets markedly improves the classification accuracy.

[1]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[2]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  Iadh Ounis,et al.  Evaluating bad query abandonment in an iterative SMS-based FAQ retrieval system , 2013, OAIR.

[5]  Ubbo Visser,et al.  Question/Answering Systems , 2012, KI - Künstliche Intelligenz.

[6]  Eriks Sneiders Automated FAQ answering with question-specific knowledge representation for web self-service , 2009, 2009 2nd Conference on Human System Interactions.

[7]  Johannes Leveling,et al.  DCU@FIRE 2011: SMS-based FAQ Retrieval , 2011 .

[8]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Johannes Leveling,et al.  On the Effect of Stopword Removal for SMS-Based FAQ Retrieval , 2012, NLDB.

[12]  Eriks Sneiders,et al.  Automated FAQ Answering: Continued Experience with Shallow Language Understanding , 1999 .

[13]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[14]  Ricardo Baeza-Yates,et al.  Improved query difficulty prediction for the web , 2008, CIKM '08.

[15]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[16]  Satoshi Nakamura,et al.  Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[18]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[19]  Nishit Shivhre SMS Based FAQ Retrieval , 2011, FIRE.

[20]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[21]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[24]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[25]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[26]  Kentaro Toyama,et al.  Mobile-Banking Adoption and Usage by Low-Literate, Low-Income Users in the Developing World , 2009, HCI.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Elirea Bornman,et al.  The Mobile Phone in Africa: Has It Become a Highway to the Information Society or Not? , 2012 .

[29]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[30]  Jonathan Donner,et al.  Research Approaches to Mobile Use in the Developing World : A Review of the Literature , 2007 .

[31]  Elisabeth Métais,et al.  Natural language interfaces : what's the problem? -a data-driven quantitative analysis , 2010 .

[32]  Cathal Gurrin,et al.  SMS Normalisation, Retrieval and Out-of-Domain Detection Approaches for SMS-Based FAQ Retrieval , 2011, FIRE.

[33]  Mark Dodgson,et al.  High-Tech Entrepreneurship in Asia: Innovation, Industry and Institutional Dynamics in Mobile Payments , 2008 .