A reliable FAQ retrieval system using a query log classification technique based on latent semantic analysis

To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users' query logs as knowledge sources. During indexing time, the proposed system efficiently clusters users' query logs using classification techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources.

[1]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[2]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[3]  Gail E. Kaiser,et al.  An Information Retrieval Approach For Automatically Constructing Software Libraries , 1991, IEEE Trans. Software Eng..

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[6]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  Kristian J. Hammond,et al.  FAQ finder: a case-based approach to knowledge navigation , 1995, Proceedings the 11th Conference on Artificial Intelligence for Applications.

[9]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[10]  W. Bruce Croft,et al.  Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..

[11]  Marek Świdziński,et al.  The Design of a Universal Basic Dictionary of Contemporary Polish , 1990 .

[12]  F. D. Saussure Cours de linguistique générale , 1924 .

[13]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[14]  Peter Willett,et al.  Comparison of Hierarchie Agglomerative Clustering Methods for Document Retrieval , 1989, Comput. J..

[15]  Steven D. Whitehead,et al.  Auto-FAQ: An Experiment in Cyberspace Leveraging , 1995, Comput. Networks ISDN Syst..

[16]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[17]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[18]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[19]  Frank A. Smadja,et al.  Lexical Co-occurrence: The Missing Link , 1989 .

[20]  Eriks Sneiders,et al.  Automated FAQ Answering: Continued Experience with Shallow Language Understanding , 1999 .