Information Processing and Management Investigating Queries and Search Failures in Academic Search

Academic search concerns the retrieval and profiling of information objects in the domain of academic research. In this paper we reveal important observations of academic search queries, and provide an algorithmic solution to address a type of failure during search sessions: null queries. We start by providing a general characterization of academic search queries, by analyzing a large-scale transaction log of a leading academic search engine. Unlike previous small-scale analyses of academic search queries, we find important differences with query characteristics known from web search. E.g., in academic search there is a substantially bigger proportion of entity queries, and a heavier tail in query length distribution. We then focus on search failures and, in particular, on null queries that lead to an empty search engine result page, on null sessions that contain such null queries, and on users who are prone to issue null queries. In academic search approximately 1 in 10 queries is a null query, and 25% of the sessions contain a null query. They appear in different types of search sessions, and prevent users from achieving their search goal. To address the high rate of null queries in academic search, we consider the task of providing query suggestions. Specifically we focus on a highly frequent query type: non-boolean informational queries. To this end we need to overcome query sparsity and make effective use of session information.We find that using entities helps to surface more relevant query suggestions in the face of query sparsity. We also find that query suggestions should be conditioned on the type of session in which they are offered to be more effective. After casting the session classification problem as a multi-label classification problem, we generate session-conditional query suggestions based on predicted session type. We find that this session-conditional method leads to significant improvements over a generic query suggestion method. Personalization yields very little further improvements over session-conditional query suggestions.

[1]  Amanda Spink,et al.  Determining the informational, navigational, and transactional intent of Web queries , 2008, Inf. Process. Manag..

[2]  Giuseppe Ottaviano,et al.  Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[3]  Ann Blandford,et al.  Understanding “influence:” an exploratory study of academics' processes of knowledge construction through iterative and interactive information seeking , 2015, J. Assoc. Inf. Sci. Technol..

[4]  Hao-Ren Ke,et al.  Exploring behavior of E-journal users in science and technology: Transaction log analysis of Elsevier's ScienceDirect OnSite in Taiwan , 2002 .

[5]  Bradley M. Hemminger,et al.  National study of information seeking behavior of academic researchers in the United States , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Ann Blandford,et al.  Keeping up to date: An academic researcher's information journey , 2017, J. Assoc. Inf. Sci. Technol..

[7]  Donna Harman,et al.  Information Processing and Management , 2022 .

[8]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[9]  Aristides Gionis,et al.  Query similarity by projecting the query-flow graph , 2010, SIGIR.

[10]  Maarten de Rijke,et al.  People searching for people: analysis of a people search engine log , 2011, SIGIR '11.

[11]  Milad Shokouhi,et al.  Learning to personalize query auto-completion , 2013, SIGIR.

[12]  Alejandro López-Ortiz,et al.  Orthogonal query recommendation , 2013, RecSys.

[13]  M. de Rijke,et al.  A Survey of Query Auto Completion in Information Retrieval , 2016, Found. Trends Inf. Retr..

[14]  Fabrizio Silvestri,et al.  Efficient query recommendations in the long tail via center-piece subgraphs , 2012, SIGIR '12.

[15]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[16]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[17]  Özgür Ulusoy,et al.  Characterizing web search queries that match very few or no results , 2012, CIKM '12.

[18]  W. Bruce Croft,et al.  Automatic suggestion of phrasal-concept queries for literature search , 2014, Inf. Process. Manag..

[19]  Xueqi Cheng,et al.  Intent-aware query similarity , 2011, CIKM '11.

[20]  Zhaohui Wu,et al.  Towards better understanding of academic search , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[21]  Francesco Bonchi,et al.  From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph , 2013, WSDM '13.

[22]  Omid Madani,et al.  A large-scale analysis of query logs for assessing personalization opportunities , 2006, KDD '06.

[23]  Penny O’Connor,et al.  American Society for Information Science and Technology, Annual Conference , 2002 .

[24]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[25]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[26]  W. Bruce Croft,et al.  Automatic boolean query suggestion for professional search , 2011, SIGIR.

[27]  Estelle Brodman,et al.  Evaluation of the MEDLARS Demand Search Service , 1969 .

[28]  M. de Rijke,et al.  Mapping queries to the Linking Open Data cloud: A case study using DBpedia , 2011, J. Web Semant..

[29]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[30]  Catherine M. Dwyer,et al.  Known-Item Search Failure in an OPAC , 1991 .

[31]  Zhiyong Lu,et al.  Understanding PubMed® user search behavior through log analysis , 2009, Database J. Biol. Databases Curation.

[32]  Yang Song,et al.  Query suggestion by constructing term-transition graphs , 2012, WSDM '12.

[33]  Bhaskar Mitra,et al.  Query Auto-Completion for Rare Prefixes , 2015, CIKM.

[34]  Ryen W. White,et al.  Struggling or exploring?: disambiguating long search sessions , 2014, WSDM.

[35]  Nish Parikh,et al.  Rewriting null e-commerce queries to recommend products , 2012, WWW.

[36]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[37]  Xueqi Cheng,et al.  A structured approach to query recommendation with social annotation data , 2010, CIKM.

[38]  Sally Jo Cunningham,et al.  A transaction log analysis of a digital library , 2000, International Journal on Digital Libraries.

[39]  Manish Gupta,et al.  Information Retrieval with Verbose Queries , 2015, Found. Trends Inf. Retr..

[40]  Dietmar Wolfram,et al.  Log Analysis of Academic Digital Library: User Query Patterns , 2014 .

[41]  Michael Gamon,et al.  Active objects: actions for entity-centric search , 2012, WWW.

[42]  Filippo Menczer,et al.  Ambiguous author query detection using crowdsourced digital library annotations , 2013, Inf. Process. Manag..

[43]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[44]  Ramjee Prasad,et al.  A heuristic hierarchical scheme for academic search and retrieval , 2013, Inf. Process. Manag..

[45]  Ryen W. White,et al.  Supporting Complex Search Tasks , 2014, CIKM.

[46]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[47]  M. de Rijke,et al.  Time-sensitive Personalized Query Auto-Completion , 2014, CIKM.

[48]  Ophir Frieder,et al.  Temporal analysis of a very large topically categorized Web query log , 2007, J. Assoc. Inf. Sci. Technol..

[49]  Jie Tang,et al.  AMiner: Toward Understanding Big Scholar Data , 2016, WSDM.

[50]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[51]  Jaime Teevan The re:search engine: simultaneous support for finding and re-finding , 2007, UIST '07.

[52]  Bradley M. Hemminger,et al.  Information seeking behavior of academic scientists , 2007, J. Assoc. Inf. Sci. Technol..

[53]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[54]  Avi Arampatzis,et al.  A study of query length , 2008, SIGIR '08.

[55]  Filip Radlinski,et al.  On user interactions with query auto-completion , 2014, SIGIR.

[56]  Aapo Kyrola,et al.  DrunkardMob: billions of random walks on just a PC , 2013, RecSys.

[57]  Albert Bifet,et al.  Deep learning in partially-labeled data streams , 2015, SAC.

[58]  Jiawei Han,et al.  adaQAC: Adaptive Query Auto-Completion via Implicit Negative Feedback , 2015, SIGIR.

[59]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[60]  Ryen W. White,et al.  Mining Historic Query Trails to Label Long and Rare Search Engine Queries , 2010, TWEB.

[61]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[62]  D A Lindberg,et al.  Internet access to the National Library of Medicine. , 2000, Effective clinical practice : ECP.