Detecting and supporting known item queries in online public access catalogs

When users seek to find specific resources in a digital library, they often use the library catalog to locate them. These catalog queries are defined as known item queries. As known item queries search for specific resources, it is important to manage them differently from other search types, such as area searches. We study how to identify known item queries in the context of a large academic institution's online public access catalog (OPAC), in which queries are issued via a simple keyword interface. We also examine how to recognize when a known item query has retrieved the item in question. Our approach combines techniques in machine learning, language modeling and machine translation evaluation metrics to build a classifier capable of distinguishing known item queries and correctly classifies titles for whether they are the known item sought with an 80% and 95% correlation to human performance, respectively on each task. To our knowledge, this is the first report of such work, which has the potential to streamline the user interface of both OPACs and digital libraries in support of known item searches

[1]  Bryce Allen Recall cues in known-item retrieval , 1989, JASIS.

[2]  Frederick G. Kilgour,et al.  Retrieval Effectiveness of Surname-Title-Word Searches for Known Items by Academic Library Users , 1999, J. Am. Soc. Inf. Sci..

[3]  Linda C. Smith,et al.  Known-item search , 2006 .

[4]  Ray R. Larson,et al.  The decline of subject searching: Long-term trends and patterns of index use in an online catalog , 1991, J. Am. Soc. Inf. Sci..

[5]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[6]  Johan Bollen,et al.  Evaluation of Digital Library Impact and User Communities by Analysis of Usage Patterns , 2002, D Lib Mag..

[7]  Mun-Kew Leong,et al.  Concrete Queries in Specialized Domains: Known Item as Feedback for Query Formulation , 1997, TREC.

[8]  Peter Ingwersen,et al.  Information seeking research needs extension toward tasks and technology , 2004, Inf. Res..

[9]  Frederick G. Kilgour,et al.  Known-item online searches employed by scholars using surname plus first, or last, or first and last title words , 2001, J. Assoc. Inf. Sci. Technol..

[10]  Dale Schuurmans,et al.  Language and Task Independent Text Categorization with Simple Language Models , 2003, NAACL.

[11]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[16]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[17]  Marcia J. Bates,et al.  Where should the person stop and the information search interface start? , 1990, Inf. Process. Manag..

[18]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[19]  Debra J. Slone Encounters with the OPAC: On-line searching in public libraries , 2000, J. Am. Soc. Inf. Sci..

[20]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.