Learning regular expressions to template-based FAQ retrieval systems

Template-based approaches have proven to be one of the most efficient and robustest ways of addressing Question Answering problems. Templates embody the expert's knowledge on the domain and his/her ability to understand and answer questions, but designing these templates may become a complex task since it is usually carried out manually. Although these methods are not automatic, companies may prefer to undertake this solution in order to offer a better service. In this article, we propose a semiautomatic method to reduce the problem of creating templates to that of validate, and possibly modify, a list of proposed templates. In this way, a better trade-off between reliability-the system is still monitored by an expert-and cost is achieved. In addition, updating templates after domain changes becomes easier, human mistakes are reduced, and portability is increased. Our proposal is based on inferring regular expressions that induce the language conveyed by a set of previously collected query reformulations. The main contribution of this work consists of the definition of a suitable optimisation measure that effectively reflects some important aspects of the problem and the theoretical soundness that supports it.

[1]  William B. Langdon,et al.  Evolving Regular Expressions for GeneChip Probe Performance Prediction , 2008, PPSN.

[2]  Kristian J. Hammond,et al.  FAQ finder: a case-based approach to knowledge navigation , 1995, Proceedings the 11th Conference on Artificial Intelligence for Applications.

[3]  Stephan Bloehdorn,et al.  Structure and semantics for expressive text kernels , 2007, CIKM '07.

[4]  P. Thangaraj,et al.  Integrated Clustering and Feature Selection Scheme for Text Documents. , 2010 .

[5]  Frank Neven,et al.  Learning deterministic regular expressions for the inference of schemas from XML data , 2010, ACM Trans. Web.

[6]  Gertjan van Noord,et al.  An Extendible Regular Expression Compiler for Finite-State Approaches in Natural Language Processing , 1999, WIA.

[7]  William M. Pottenger,et al.  A semi-supervised active learning algorithm for information extraction from textual data , 2005, J. Assoc. Inf. Sci. Technol..

[8]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[9]  Zhong Min Juan An Effective Similarity Measurement for FAQ Question Answering System , 2010, 2010 International Conference on Electrical and Control Engineering.

[10]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[11]  Guy Lapalme,et al.  Mercure: Towards an Automatic E-mail Follow-up System , 2003, IEEE Intell. Informatics Bull..

[12]  Hercules Dalianis,et al.  Comparing Manual Text Patterns and Machine Learning for Classification of E-Mails for Automatic Answering by a Government Agency , 2011, CICLing.

[13]  Eric Medvet,et al.  Automatic generation of regular expressions from examples with genetic programming , 2012, GECCO '12.

[14]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[15]  Wayne H. Ward,et al.  Question Classification with Support Vector Machines and Error Correcting Codes , 2003, HLT-NAACL.

[16]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[17]  Thomas Schwentick,et al.  Inference of concise DTDs from XML data , 2006, VLDB.

[18]  Juan Luis Castro,et al.  FAQtory: A framework to provide high-quality FAQ retrieval systems , 2012, Expert Syst. Appl..

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[20]  Mario Lenz,et al.  Question Answering with Textual CBR , 1998, FQAS.

[21]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[22]  Ingrid Zukerman,et al.  An Empirical Study of Corpus-Based Response Automation Methods for an E-mail-Based Help-Desk Domain , 2009, CL.

[23]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[24]  Gregory Grefenstette,et al.  Regular expressions for language engineering , 1996, Natural Language Engineering.

[25]  Wen-Lian Hsu,et al.  A template alignment algorithm for question classification , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[26]  Constantin Orasan,et al.  Automatic Question Pattern Generation for Ontology-based Question Answering , 2008, FLAIRS.

[27]  Dell Zhang,et al.  Web Based Pattern Mining and Matching Approach to Question Answering , 2002, TREC.

[28]  Andrea Esuli,et al.  Boosting multi-label hierarchical text categorization , 2008, Information Retrieval.

[29]  Jimmy J. Lin,et al.  Omnibase: Uniform Access to Heterogeneous Data for Question Answering , 2002, NLDB.

[30]  Roshan Ragel,et al.  An automatic answering system with template matching for natural language questions , 2010, 2010 Fifth International Conference on Information and Automation for Sustainability.

[31]  Sriram Raghavan,et al.  Regular Expression Learning for Information Extraction , 2008, EMNLP.

[32]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[33]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[34]  Eriks Sneiders,et al.  Automated Question Answering Using Question Templates That Cover the Conceptual Model of the Database , 2002, NLDB.

[35]  Tat-Seng Chua,et al.  Soft pattern matching models for definitional question answering , 2007, TOIS.

[36]  Brian J. Ross,et al.  Probabilistic Pattern Matching and the Evolution of Stochastic Regular Expressions , 2000, Applied Intelligence.

[37]  Berthier A. Ribeiro-Neto,et al.  A generic Web-based entity resolution framework , 2011, J. Assoc. Inf. Sci. Technol..

[38]  Yasubumi Sakakibara,et al.  Learning context-free grammars from structural data in polynomial time , 1988, COLT '88.

[39]  Eriks Sneiders Automated FAQ answering with question-specific knowledge representation for web self-service , 2009, 2009 2nd Conference on Human System Interactions.

[40]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[41]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[42]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[43]  Aristides Gionis,et al.  XTRACT: a system for extracting document type descriptors from XML documents , 2000, SIGMOD 2000.

[44]  Jianhui Li,et al.  Generating Syntactic Tree Templates for Feature-Based Opinion Mining , 2011, ADMA.

[45]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[46]  Juan Luis Castro,et al.  A high-performance FAQ retrieval method using minimal differentiator expressions , 2012, Knowl. Based Syst..

[47]  Yu Hao,et al.  Function-Based Question Classification for General QA , 2010, EMNLP.

[48]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[49]  Mihai Surdeanu,et al.  Learning to Rank Answers to Non-Factoid Questions from Web Collections , 2011, CL.

[50]  Ahmet Cetinkaya Regular expression generation through grammatical evolution , 2007, GECCO '07.

[51]  G. N. Purohit,et al.  Construction of a Minimal Deterministic Finite Automaton from a Regular Expression , 2011 .

[52]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[53]  Ren-Hua Wang,et al.  Supervised Learning Approach to Optimize Ranking Function for Chinese FAQ-Finder , 2007, PAKDD.

[54]  Efim B. Kinber,et al.  On Learning Regular Expressions and Patterns Via Membership and Correction Queries , 2008, ICGI.

[55]  Ronen Feldman,et al.  Self-supervised relation extraction from the Web , 2007, Knowledge and Information Systems.

[56]  William M. Pottenger,et al.  A semi-supervised active learning algorithm for information extraction from textual data: Research Articles , 2005 .

[57]  Henning Fernau,et al.  Algorithms for learning regular expressions from positive data , 2009, Inf. Comput..

[58]  Diego Molla Aliod,et al.  Question Answering in Restricted Domains: An Overview , 2007, CL.

[59]  Steven D. Whitehead Auto-FAQ: An Experiment in Cyberspace Leveraging , 1995, Comput. Networks ISDN Syst..

[60]  Kwong-Sak Leung,et al.  Using Grammar Based Genetic Programming for Data Mining of Medical Knowledge , 2006 .

[61]  Sheng-Yuan Yang Developing of an ontological interface agent with template-based linguistic processing technique for FAQ services , 2009, Expert Syst. Appl..

[62]  Eriks Sneiders,et al.  Automated FAQ Answering: Continued Experience with Shallow Language Understanding , 1999 .

[63]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[64]  Manuel Palomar,et al.  Semantic pattern learning through maximum entropy-based WSD technique , 2001, CoNLL.

[65]  Juan Luis Castro,et al.  A cloud of FAQ: A highly-precise FAQ retrieval system for the Web 2.0 , 2013, Knowl. Based Syst..

[66]  Sung-Bae Cho,et al.  Interactive Genetic Programming for the Sentence Generation of Dialogue-based Travel Planning System , 2004 .

[67]  Eriks Sneiders,et al.  Automated question answering: review of the main approaches , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[68]  Ulf Hermjakob,et al.  Parsing and Question Classification for Question Answering , 2001, ACL 2001.

[69]  Eriks Sneiders Automated Email Answering by Text Pattern Matching , 2010, IceTAL.

[70]  Sven Schmeier,et al.  Message Classification in the Call Center , 2000, ANLP.

[71]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[72]  Diego Mollá Aliod,et al.  Question Classification by Structure Induction , 2005, IJCAI.

[73]  Alexander H. Waibel,et al.  A Pattern Learning Approach to Question Answering Within the Ephyra Framework , 2006, TSD.

[74]  Sheng-Yuan Yang,et al.  Ontology-supported FAQ processing and ranking techniques , 2006, Journal of Intelligent Information Systems.

[75]  Jungyun Seo,et al.  Cluster-Based FAQ Retrieval Using Latent Term Weights , 2008, IEEE Intelligent Systems.

[76]  Marta Recasens,et al.  On Paraphrase and Coreference , 2010, Computational Linguistics.

[77]  Martin M. Soubbotin Patterns of Potential Answer Expressions as Clues to the Right Answers , 2001, TREC.