Metadata categorization for identifying search patterns in a digital library

For digital libraries, it is useful to understand how users search in a collection. Investigating search patterns can help them to improve the user interface, collection management and search algorithms. However, search patterns may vary widely in different parts of a collection. The purpose of this paper is to demonstrate how to identify these search patterns within a well-curated historical newspaper collection using the existing metadata.,The authors analyzed search logs combined with metadata records describing the content of the collection, using this metadata to create subsets in the logs corresponding to different parts of the collection.,The study shows that faceted search is more prevalent than non-faceted search in terms of number of unique queries, time spent, clicks and downloads. Distinct search patterns are observed in different parts of the collection, corresponding to historical periods, geographical regions or subject matter.,First, this study provides deeper insights into search behavior at a fine granularity in a historical newspaper collection, by the inclusion of the metadata in the analysis. Second, it demonstrates how to use metadata categorization as a way to analyze distinct search patterns in a collection.

[1]  Michalis Sfakakis,et al.  User Behavior Tendencies on Data Collections in a Digital Library , 2002, ECDL.

[2]  Jure Leskovec,et al.  Analyzing Information Seeking and Drug-Safety Alert Response by Health Care Professionals as New Methods for Surveillance , 2015, Journal of medical Internet research.

[3]  Laura Hollink,et al.  Search behavior of media professionals at an audiovisual archive: A transaction log analysis , 2010 .

[4]  Amanda Spink,et al.  Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.

[5]  Yannis Manolopoulos,et al.  Research and Advanced Technology for Digital Libraries : 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Thessaloniki, Greece, September 18-21, 2017 : proceedings , 2017 .

[6]  Marti A. Hearst,et al.  Finding the flow in web site search , 2002, CACM.

[7]  Rita Wan-Chik,et al.  Investigating religious information searching through analysis of a search engine log , 2013, J. Assoc. Inf. Sci. Technol..

[8]  Ryen W. White,et al.  Lessons from the journey: a query log analysis of within-session learning , 2014, WSDM.

[9]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[10]  Ryen W. White,et al.  Search, interrupted: understanding and predicting search task continuation , 2012, SIGIR '12.

[11]  Hao-Ren Ke,et al.  Exploring behavior of E-journal users in science and technology: Transaction log analysis of Elsevier's ScienceDirect OnSite in Taiwan , 2002 .

[12]  Sally Jo Cunningham,et al.  Search Behavior in a Research-Oriented Digital Library , 2001, ECDL.

[13]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[14]  Sanjay Goel,et al.  Collaborative Search Log Sanitization: Toward Differential Privacy and Boosted Utility , 2015, IEEE Transactions on Dependable and Secure Computing.

[15]  Paul D. Clough,et al.  Europeana: What Users Search for and Why , 2017, TPDL.

[16]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[17]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[18]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[19]  Brian D. Davison,et al.  Introduction to special section on adversarial issues in Web search , 2008, TWEB.

[20]  Paul Gooding,et al.  Exploring the information behaviour of users of Welsh Newspapers Online through web log analysis , 2016, J. Documentation.

[21]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[22]  Doug Downey,et al.  Models of Searching and Browsing: Languages, Studies, and Application , 2007, IJCAI.

[23]  Arjen P. de Vries,et al.  Semantic search log analysis: A method and a study on professional image search , 2011, J. Assoc. Inf. Sci. Technol..

[24]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[25]  Paul D. Clough,et al.  Investigating the information-seeking behaviour of genealogists and family historians , 2013, J. Inf. Sci..

[26]  Bradley M. Hemminger,et al.  Analyzing the interaction patterns in a faceted search interface , 2015, J. Assoc. Inf. Sci. Technol..

[27]  Laura Hollink,et al.  SWISH DataLab: A Web Interface for Data Exploration and Analysis , 2016, BNCAI.

[28]  Gang Wang,et al.  Unsupervised Clickstream Clustering for User Behavior Analysis , 2016, CHI.

[29]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[30]  Alissa Cooper,et al.  A survey of query log privacy-enhancing techniques from a policy perspective , 2008, TWEB.

[31]  Sally Jo Cunningham,et al.  A transaction log analysis of a digital library , 2000, International Journal on Digital Libraries.

[32]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[33]  Ravi Kumar,et al.  Vanity fair: privacy in querylog bundles , 2008, CIKM '08.

[34]  Grace Hui Yang,et al.  Anonymizing Query Logs by Differential Privacy , 2016, SIGIR.

[35]  Dietmar Wolfram,et al.  An exploration of search session patterns in an image-based digital library , 2016, J. Inf. Sci..

[36]  CooperAlissa A survey of query log privacy-enhancing techniques from a policy perspective , 2008 .