A day in the life of PubMed: analysis of a typical day's query log.

OBJECTIVE To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines. DESIGN We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day. MEASUREMENTS We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies. RESULTS The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms. CONCLUSION PubMed's usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.

[1]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[2]  H M Schoolman,et al.  The United States National Library of Medicine. , 1989, Seminars in dermatology.

[3]  K. A. McKibbon,et al.  Online access to MEDLINE in clinical settings. A study of use and usefulness. , 1990, Annals of internal medicine.

[4]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[5]  Manfred A. Jeusfeld,et al.  Beyond information searching and browsing: acquiring knowledge from digital libraries , 2005, Inf. Process. Manag..

[6]  Olivia R. Liu Sheng,et al.  Analysis of the query logs of a Web site search engine , 2005, J. Assoc. Inf. Sci. Technol..

[7]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[8]  Beth Logan,et al.  Speechbot: A Content-Based Search Index For Multimedia On The Web , 2000 .

[9]  Wenwei Xue,et al.  Form-based proxy caching for database-backed web sites: keywords and functions , 2006, The VLDB Journal.

[10]  Charles P. Friedman,et al.  Research Paper: Factors Associated with Success in Searching MEDLINE and Applying Evidence to Answer Clinical Questions , 2002, J. Am. Medical Informatics Assoc..

[11]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[12]  Yindalon Aphinyanagphongs,et al.  Research Paper: Using Citation Data to Improve Retrieval from MEDLINE , 2006, J. Am. Medical Informatics Assoc..

[13]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[14]  Kevin S. McCurley,et al.  Untangling compound documents on the web , 2003, HYPERTEXT '03.

[15]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[16]  W. Hersh,et al.  Factors associated with successful answering of clinical questions using an information retrieval system. , 2002, Bulletin of the Medical Library Association.

[17]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[18]  Eve-Marie Lacroix,et al.  The US National Library of Medicine in the 21st century: expanding collections, nontraditional formats, new audiences. , 2002, Health information and libraries journal.