How Writers Search: Analyzing the Search and Writing Logs of Non-fictional Essays

Many writers of non-fictional texts engage intensively in exploratory web search scenarios during their background research on the essay topic. Though understanding such search behavior is necessary for the development of search engines that specifically support writing tasks, it has neither been systematically recorded nor analyzed. This paper contributes part of the missing research: We report on the outcomes of a large-scale corpus construction initiative to acquire detailed interaction logs of writers who were given a writing task on 150 pre-defined TREC topics. The corpus is freely available to foster research on exploratory search. Each essay is at least 5000 words long and comes with a chronological log of search queries, result clicks, web browsing trails, and fine-grained writing revisions that reflect the task completion status. To ensure reproducibility, a fully-fledged, static web search environment has been created on top of the ClueWeb09 corpus as part of our initiative. In this paper, we present initial analyses of the recorded search interaction logs and overview insights gained from them: (1) essay writing behavior corresponds to search patterns that are rather stable for the same writer, (2) fact-checking queries often conclude a writing task, (3) recurring anchor queries are often submitted to not lose the main themes or to explore new directions, (4) query terms can be learned while searching and reading, (5) the number of submitted queries is not a good indicator for task completion.

[1]  D. Crawford Introduction , 2008, CACM.

[2]  Peter Ingwersen,et al.  Information seeking research needs extension toward tasks and technology , 2004, Inf. Res..

[3]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[4]  Eero Sormunen,et al.  A method for the analysis of information use in source-based writing , 2012, Inf. Res..

[5]  Umut Ozertem,et al.  Learning to suggest: a machine learning framework for ranking query suggestions , 2012, SIGIR '12.

[6]  Pertti Vakkari,et al.  Search effort degrades search output but improves task outcome , 2012, J. Assoc. Inf. Sci. Technol..

[7]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[8]  Enhong Chen,et al.  Towards context-aware search by learning a very large variable length hidden markov model from search logs , 2009, WWW '09.

[9]  Jaime Arguello Predicting Search Task Difficulty , 2014, ECIR.

[10]  Matthias Hagen,et al.  From search session detection to search mission detection , 2013, OAIR.

[11]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[12]  Matthias Hagen,et al.  Crowdsourcing Interaction Logs to Understand Text Reuse from the Web , 2013, ACL.

[13]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[14]  Gary Marchionini,et al.  Examining the effectiveness of real-time query expansion , 2007, Inf. Process. Manag..

[15]  Ryen W. White,et al.  Search, interrupted: understanding and predicting search task continuation , 2012, SIGIR '12.

[16]  Noriko Kando,et al.  Using a concept map to evaluate exploratory search , 2010, IIiX.

[17]  Nicholas J. Belkin,et al.  Searching vs. writing: Factors affecting information use task performance , 2012, ASIST.

[18]  Pertti Vakkari,et al.  Exploratory Searching As Conceptual Exploration , 2010 .

[19]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[20]  Benno Stein,et al.  Beyond precision@10: clustering the long tail of web search results , 2011, CIKM '11.

[21]  Martin Kurth,et al.  The limits and limitations of transaction log analysis , 1993 .

[22]  Stuart K. Card,et al.  The cost structure of sensemaking , 1993, INTERCHI.

[23]  Brenda Dervin,et al.  From the mind’s eye of the user: The sense-making qualitative-quantitative methodology. , 1992 .

[24]  Victor Carneiro,et al.  Search shortcuts: a new approach to the recommendation of queries , 2009, RecSys '09.

[25]  Gary Marchionini,et al.  Editorial: Evaluating exploratory search systems , 2008 .

[26]  Elaine Toms,et al.  The development and evaluation of a survey to measure user engagement , 2010, J. Assoc. Inf. Sci. Technol..

[27]  Johanna Enberg,et al.  Query Expansion , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[28]  George W. Furnas,et al.  Model-driven formative evaluation of exploratory search: A study under a sensemaking framework , 2008, Inf. Process. Manag..

[29]  Elaine Toms,et al.  What is user engagement? A conceptual framework for defining user engagement with technology , 2008, J. Assoc. Inf. Sci. Technol..

[30]  Matthias Hagen,et al.  Exploratory Search Missions for TREC Topics , 2013, EuroHCIR.

[31]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[32]  Matthias Hagen,et al.  ChatNoir: a search engine for the ClueWeb09 corpus , 2012, SIGIR '12.

[33]  Ryen W. White,et al.  Slow search , 2014, CACM.

[34]  Amanda Spink,et al.  Research and Methodological Foundations of Transaction Log Analysis , 2008 .