Query for Architecture, Click through Military: Comparing the Roles of Search and Navigation on Wikipedia

As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and employ two main metrics, namely (i) searchshare -- the relative amount of views an article received by search --, and (ii) resistance -- the ability of an article to relay traffic to other Wikipedia articles -- to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a "dead end'' for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach. The results of this paper constitute a step towards understanding human information seeking behavior on the Web.

[1]  Denis Helic,et al.  Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order , 2014, PloS one.

[2]  Markus Strohmaier,et al.  Visual Positions of Links and Clicks on Wikipedia , 2016, WWW.

[3]  Kristina Lerman,et al.  How the structure of Wikipedia articles influences user navigation , 2016, New Rev. Hypermedia Multim..

[4]  Ravi Kumar,et al.  A Characterization of Online Search Behavior , 2009, IEEE Data Eng. Bull..

[5]  Brent J. Hecht,et al.  The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies , 2017, ICWSM.

[6]  Mounia Lalmas,et al.  Reader preferences and behavior on Wikipedia , 2014, HT.

[7]  Marijn ten Thij,et al.  Modeling and predicting page-view dynamics on Wikipedia , 2012, ArXiv.

[8]  Anselm Spoerri,et al.  What is popular on Wikipedia and why? , 2007, First Monday.

[9]  Maribel Acosta,et al.  TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia , 2017, ICWSM.

[10]  Greg McVerry,et al.  New Literacies of Online Reading Comprehension , 2012 .

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Jure Leskovec,et al.  Human wayfinding in information networks , 2012, WWW.

[13]  T. H. Nelson,et al.  Complex information processing: a file structure for the complex, the changing and the indeterminate , 1965, ACM '65.

[14]  Ingmar Weber,et al.  Who uses web search for what: and how , 2011, WSDM '11.

[15]  Denis Helic,et al.  The Role of Structural Information for Designing Navigational User Interfaces , 2015, HT.

[16]  Markus Strohmaier,et al.  What Makes a Link Successful on Wikipedia? , 2016, WWW.

[17]  Taha Yasseri,et al.  Inspiration, Captivation, and Misdirection: Emergent Properties in Networks of Online Navigation , 2017, ArXiv.

[18]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[19]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[20]  Vivienne Waller,et al.  The search queries that took Australian Internet users to Wikipedia , 2011, Inf. Res..

[21]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[22]  Santo Fortunato,et al.  Characterizing and modeling the dynamics of online popularity , 2010, Physical review letters.

[23]  Jure Leskovec,et al.  Why We Read Wikipedia , 2017, WWW.

[24]  A. Hotho,et al.  HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web , 2014, WWW.

[25]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[26]  Julie Coiro,et al.  Exploring the Online Reading Comprehension Strategies Used by Sixth-Grade Skilled Readers to Search for and Locate Information on the Internet. , 2007 .

[27]  Denis Helic,et al.  Evaluating and Improving Navigability of Wikipedia: A Comparative Study of Eight Language Editions , 2016, OpenSym.

[28]  Katrin Weller,et al.  Analysing Timelines of National Histories Across Wikipedia Editions: A Comparative Computational Approach , 2017, ICWSM.

[29]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[30]  Anne Mangen,et al.  Hypertext fiction reading: haptics and immersion , 2008 .

[31]  Mark Fischetti,et al.  Weaving the web - the original design and ultimate destiny of the World Wide Web by its inventor , 1999 .

[32]  George W. Furnas,et al.  Effective view navigation , 1997, CHI.

[33]  Alistair Moffat,et al.  A similarity measure for indefinite rankings , 2010, TOIS.

[34]  Reinhold Scherer,et al.  Models of human navigation in information networks based on decentralized search , 2013, HT.

[35]  Jure Leskovec,et al.  Improving Website Hyperlink Structure Using Server Logs , 2015, WSDM.

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..