High-performance FAQ retrieval using an automatic clustering method of query logs

To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users' queries into predefined FAQ categories. To increase the precision and the recall rate of clustering, the proposed system adopts a new similarity measure using a machine readable dictionary. On searching time, the proposed system calculates the similarities between users' queries and each cluster in order to smooth FAQs. By virtue of the cluster-based retrieval technique, the proposed system could partially bridge lexical chasms between queries and FAQs. In addition, the proposed system outperforms the traditional information retrieval systems in FAQ retrieval.

[1]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[2]  Hideki Kozima,et al.  Similarity between Words Computed by Spreading Activation on an English Dictionary , 1993, EACL.

[3]  Eriks Sneiders,et al.  Automated FAQ Answering: Continued Experience with Shallow Language Understanding , 1999 .

[4]  Peter Willett,et al.  Comparison of Hierarchie Agglomerative Clustering Methods for Document Retrieval , 1989, Comput. J..

[5]  Kristian J. Hammond,et al.  Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System , 1997, AI Mag..

[6]  Jörg P. Müller,et al.  Doing business in the information marketplace: a case study , 1999, AGENTS '99.

[7]  Kristian J. Hammond,et al.  FAQ finder: a case-based approach to knowledge navigation , 1995, Proceedings the 11th Conference on Artificial Intelligence for Applications.

[8]  W. Bruce Croft,et al.  Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..

[9]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[10]  Marek Świdziński,et al.  The Design of a Universal Basic Dictionary of Contemporary Polish , 1990 .

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[13]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[14]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[15]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[16]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[17]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[18]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[19]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[20]  Steven D. Whitehead Auto-FAQ: An Experiment in Cyberspace Leveraging , 1995, Comput. Networks ISDN Syst..

[21]  Michael A. West,et al.  A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology , 1953 .

[22]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[23]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.