Characterizing and predicting downloads in academic search

Abstract Numerous studies have been conducted on the information interaction behavior of search engine users. Few studies have considered information interactions in the domain of academic search. We focus on conversion behavior in this domain. Conversions have been widely studied in the e-commerce domain, e.g., for online shopping and hotel booking, but little is known about conversions in academic search. We start with a description of a unique dataset of a particular type of conversion in academic search, viz. users’ downloads of scientific papers. Then we move to an observational analysis of users’ download actions. We first characterize user actions and show their statistics in sessions. Then we focus on behavioral and topical aspects of downloads, revealing behavioral correlations across download sessions. We discover unique properties that differ from other conversion settings such as online shopping. Using insights gained from these observations, we consider the task of predicting the next download. In particular, we focus on predicting the time until the next download session, and on predicting the number of downloads. We cast these as time series prediction problems and model them using LSTMs. We develop a specialized model built on user segmentations that achieves significant improvements over the state-of-the art.

[1]  Felix Beierle,et al.  Exploring Choice Overload in Related-Article Recommendations in Digital Libraries , 2017, BIR@ECIR.

[2]  Jian Ma,et al.  A hybrid approach for article recommendation in research social networks , 2018, J. Inf. Sci..

[3]  Sir Padampat Singhania ANALYSING EFFECTS OF INFORMATION OVERLOAD ON DECISION QUALITY IN AN ONLINE ENVIRONMENT , 2015 .

[4]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[5]  M. de Rijke,et al.  Do Topic Shift and Query Reformulation Patterns Correlate in Academic Search? , 2017, ECIR.

[6]  Jinfeng Yi,et al.  Similarity Preserving Representation Learning for Time Series Analysis , 2017, ArXiv.

[7]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[8]  Marco Gori,et al.  Recommender Systems : A Random-Walk Based Approach , 2006 .

[9]  Beibei Li,et al.  Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue , 2013, Manag. Sci..

[10]  James P. Callan,et al.  Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding , 2017, WWW.

[11]  Zhaohui Wu,et al.  Towards better understanding of academic search , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[12]  Wang-Chien Lee,et al.  CiteSeerx: an architecture and web service design for an academic document search engine , 2006, WWW '06.

[13]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[14]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[15]  Kristina Lerman,et al.  Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior , 2015, WSDM.

[16]  Ann Blandford,et al.  Keeping up to date: An academic researcher's information journey , 2017, J. Assoc. Inf. Sci. Technol..

[17]  M. de Rijke,et al.  Information Processing and Management Investigating Queries and Search Failures in Academic Search , 2022 .

[18]  Patrice Bellot,et al.  BIBLME RecSys: Harnessing Bibliometric Measures for a Scholarly Paper Recommender System , 2018, BIR@ECIR.

[19]  Muhammad Aljukhadar Information Overload and Usage of Recommendations , 2010 .

[20]  Christopher Ré,et al.  ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information , 2017, MLHC.

[21]  Hao-Ren Ke,et al.  Exploring behavior of E-journal users in science and technology: Transaction log analysis of Elsevier's ScienceDirect OnSite in Taiwan , 2002 .

[22]  Ansgar Scherp,et al.  Profiling vs. time vs. content: What does matter for top-k publication recommendation based on Twitter profiles? , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Ismael Rafols,et al.  Is science becoming more interdisciplinary? Measuring and mapping six research fields over time , 2009, Scientometrics.

[25]  Ted Taekyoung Kwon,et al.  Online Footsteps to Purchase: Exploring Consumer Behaviors on Online Shopping Sites , 2015, WebSci.

[26]  Lior Rokach,et al.  "Please, Not Now!": A Model for Timing Recommendations , 2015, RecSys.

[27]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[28]  Anasua Mitra,et al.  On Low Overlap among Search Results of Academic Search Engines , 2017, WWW.

[29]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[30]  Ming Yang,et al.  Scientific articles recommendation , 2013, CIKM.

[31]  Chia-Ying Li,et al.  Why do online consumers experience information overload? An extension of communication theory , 2017, J. Inf. Sci..

[32]  Seung-won Hwang,et al.  Predicting Online Purchase Conversion for Retargeting , 2017, WSDM.

[33]  Maarten de Rijke,et al.  A Context-aware Time Model for Web Search , 2016, SIGIR.

[34]  Kristina Lerman,et al.  iPhone's Digital Marketplace: Characterizing the Big Spenders , 2017, WSDM.

[35]  Ann Blandford,et al.  Understanding “influence:” an exploratory study of academics' processes of knowledge construction through iterative and interactive information seeking , 2015, J. Assoc. Inf. Sci. Technol..

[36]  Bradley M. Hemminger,et al.  Information seeking behavior of academic scientists , 2007, J. Assoc. Inf. Sci. Technol..

[37]  Bradley M. Hemminger,et al.  National study of information seeking behavior of academic researchers in the United States , 2010, J. Assoc. Inf. Sci. Technol..

[38]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Bradley M. Hemminger,et al.  A study of factors that affect the information-seeking behavior of academic scientists , 2012, J. Assoc. Inf. Sci. Technol..

[41]  Jie Tang,et al.  AMiner: Toward Understanding Big Scholar Data , 2016, WSDM.

[42]  Carol Tenopir,et al.  Viewing and reading behaviour in a virtual environment: The full-text download and what can be read into it , 2008, Aslib Proc..