A large scale study of reader interactions with images on Wikipedia

Wikipedia is the largest source of free encyclopedic knowledge and one of the most visited sites on the Web. To increase reader understanding of the article, Wikipedia editors add images within the text of the article’s body. However, despite their widespread usage on web platforms and the huge volume of visual content on Wikipedia, little is known about the importance of images in the context of free knowledge environments. To bridge this gap, we collect data about English Wikipedia reader interactions with images during one month and perform the first large-scale analysis of how interactions with images happen on Wikipedia. First, we quantify the overall engagement with images, finding that one in 29 pageviews results in a click on at least one image, one order of magnitude higher than interactions with other types of article content. Second, we study what factors associate with image engagement and observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people. Third, we look at interactions with Wikipedia article previews and find that images help support reader information need when navigating through the site, especially for more popular pages. The findings in this study deepen our understanding of the role of images for free knowledge and provide a guide for Wikipedia editors and web user communities to enrich the world’s largest source of encyclopedic knowledge.

[1]  Jiebo Luo,et al.  Cultural Diffusion and Trends in Facebook Photographs , 2017, ICWSM.

[2]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[3]  Rossano Schifanella,et al.  An Image Is Worth More than a Thousand Favorites: Surfacing the Hidden Beauty of Flickr Pictures , 2015, ICWSM.

[4]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Bogdan Ionescu,et al.  Computational Understanding of Visual Interestingness Beyond Semantics , 2019, ACM Comput. Surv..

[6]  Katherine Landau Wright,et al.  Do You Get the Picture? A Meta-Analysis of the Effect of Graphics on Reading Comprehension , 2020, AERA Open.

[7]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[8]  Arthur Zimek,et al.  Density-Based Clustering Validation , 2014, SDM.

[9]  Mounia Lalmas,et al.  Reader preferences and behavior on Wikipedia , 2014, HT.

[10]  Shiqi Wang,et al.  Intrinsic Image Popularity Assessment , 2019, ACM Multimedia.

[11]  Wei Zhang,et al.  User-guided Hierarchical Attention Network for Multi-modal Social Image Popularity Prediction , 2018, WWW.

[12]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[13]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Giovanni Colavizza,et al.  Quantifying Engagement with Citations on Wikipedia , 2020, WWW.

[15]  Anselm Spoerri,et al.  What is popular on Wikipedia and why? , 2007, First Monday.

[16]  Florian Lemmerich,et al.  Why the World Reads Wikipedia: Beyond English Speakers , 2018, WSDM.

[17]  Jure Leskovec,et al.  Why We Read Wikipedia , 2017, WWW.

[18]  Dario Rossi,et al.  A Large-scale Study of Wikipedia Users' Quality of Experience , 2019, WWW.

[19]  Fernanda B. Viégas The Visual Side of Wikipedia , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[20]  Kristofer Erickson,et al.  What is the Commons Worth?: Estimating the Value of Wikimedia Imagery by Observing Downstream Use , 2018, OpenSym.

[21]  Jens Lehmann,et al.  DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons , 2015, International Semantic Web Conference.

[22]  Alfred Bork,et al.  Multimedia in Learning , 2001 .

[23]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[24]  Mangesh V. Jagtap Impact of Interactive Multimedia in E-Learning Technologies , 2020 .

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Brent J. Hecht,et al.  The_Tower_of_Babel.jpg: Diversity of Visual Encyclopedic Knowledge Across Wikipedia Language Editions , 2018, ICWSM.

[27]  Rómer Rosales,et al.  Post-click conversion modeling and analysis for non-guaranteed delivery display advertising , 2012, WSDM '12.

[28]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[29]  Richard E. Mayer,et al.  Cognitive Theory of Multimedia Learning , 2021, The Cambridge Handbook of Multimedia Learning.

[30]  Robert West,et al.  Crosslingual Topic Modeling with WikiPDA , 2020, ArXiv.

[31]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[32]  Paul J. Heald,et al.  The Valuation of Unprotected Works: A Case Study of Public Domain Images on Wikipedia , 2015 .

[33]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[34]  Bernard J. Jansen Searching for digital images on the web , 2008, J. Documentation.

[35]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.

[36]  Christos Diou,et al.  Multi-evidence User Group Discovery in Professional Image Search , 2014, ECIR.

[37]  Robert M. Bernard Using extended captions to improve learning from instructional illustrations , 1990, Br. J. Educ. Technol..

[38]  Nadaleen Tempelman-Kluit Multimedia Learning Theories and Online Instruction , 2006 .

[39]  Eric Gilbert,et al.  Faces engage us: photos with faces attract more likes and comments on Instagram , 2014, CHI.

[40]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[41]  ORES , 2020 .

[42]  Elena Villaespesa,et al.  Image-based information: paintings in Wikipedia , 2021, J. Documentation.

[43]  W. Howard Levie,et al.  Effects of text illustrations: A review of research , 1982 .

[44]  András Kornai,et al.  Dynamics of Conflicts in Wikipedia , 2012, PloS one.

[45]  Anh-Phuong Ta,et al.  Factorization machines with follow-the-regularized-leader for CTR prediction in display advertising , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[46]  Sean W. Smith,et al.  Reputation and Reliability in Collective Goods , 2009 .

[47]  Tao Chen,et al.  Multilingual Visual Sentiment Concept Matching , 2016, ICMR.

[48]  Amanda Spink,et al.  The Effect of Specialized Multimedia Collections on Web Searching , 2004, J. Web Eng..

[49]  Aditya Khamparia,et al.  Impact of Interactive Multimedia in E-Learning Technologies: Role of Multimedia in E-Learning , 2017 .

[50]  Amin Mantrach,et al.  Deep Character-Level Click-Through Rate Prediction for Sponsored Search , 2017, SIGIR.

[51]  Aaron Halfaker,et al.  ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia , 2020, Proc. ACM Hum. Comput. Interact..

[52]  Mark H. Johnson,et al.  CONSPEC and CONLERN: a two-process theory of infant face recognition. , 1991, Psychological review.

[53]  Benjamin Bustos,et al.  IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images , 2017, SEMWEB.

[54]  Rossano Schifanella,et al.  A Large-Scale Study of User Image Search Behavior on the Web , 2015, CHI.

[55]  Giovanni Colavizza,et al.  On the Value of Wikipedia as a Gateway to the Web , 2021, WWW.

[56]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  Jure Leskovec,et al.  Improving Website Hyperlink Structure Using Server Logs , 2015, WSDM.

[58]  J. Peeck Increasing picture effects in learning from illustrated text , 1993 .

[59]  Olivier Chapelle,et al.  Modeling delayed feedback in display advertising , 2014, KDD.

[60]  Wolfgang Nejdl,et al.  Extracting Event-Related Information from Article Updates in Wikipedia , 2013, ECIR.

[61]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[62]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[63]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..