A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft)

In January 2019, prompted by the Wikimedia Movement's 2030 strategic direction, the Research team at the Wikimedia Foundation identified the need to develop a knowledge gaps index -- a composite index to support the decision makers across the Wikimedia movement by providing: a framework to encourage structured and targeted brainstorming discussions; data on the state of the knowledge gaps across the Wikimedia projects that can inform decision making and assist with measuring the long term impact of large scale initiatives in the Movement. Since July 2019 and as the first step toward building the knowledge gap index, the Research team has developed the first complete draft of a taxonomy of knowledge gaps for the Wikimedia projects. We studied more than 200 references by scholars, researchers, practitioners, community members and affiliates -- exposing evidence of knowledge gaps in readership, contributorship, and content of Wikimedia projects. We elaborated the findings and compiled the taxonomy of knowledge gaps in this paper, where we describe, group and classify knowledge gaps into a structured framework. The taxonomy you will learn more about in the rest of this work will serve as a basis to operationalize and quantify knowledge equity, one of the two 2030 strategic directions, through the knowledge gaps index.

[1]  Benjamin S. Bloom,et al.  A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom's Taxonomy of Educational Objectives , 2000 .

[2]  Kevin B Smith,et al.  Typologies, Taxonomies, and the Benefits of Policy Classification , 2002 .

[3]  Stefano Tarantola,et al.  Handbook on Constructing Composite Indicators: Methodology and User Guide , 2005 .

[4]  Fernanda B. Viégas The Visual Side of Wikipedia , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[5]  X. Zhang,et al.  Group Size and Incentives to Contribute: A Natural Experiment at Chinese Wikipedia , 2010 .

[6]  John Riedl,et al.  SuggestBot: using intelligent task routing to help people find work in wikipedia , 2007, IUI '07.

[7]  Barbara Leporini,et al.  Making Wikipedia editing easier for the blind , 2008, NordiCHI.

[8]  Darren Gergle,et al.  Measuring self-focus bias in community-maintained knowledge repositories , 2009, C&T.

[9]  Hichang Cho,et al.  Testing an integrative theoretical model of knowledge-sharing behavior in the context of Wikipedia , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Joseph M. Reagle,et al.  Gender Bias in Wikipedia and Britannica , 2011 .

[11]  Susan C. Herring,et al.  Cultural bias in Wikipedia content on famous persons , 2011, J. Assoc. Inf. Sci. Technol..

[12]  András Kornai,et al.  A Practical Approach to Language Complexity: A Wikipedia Case Study , 2012, PloS one.

[13]  Finn Årup Nielsen,et al.  The People’s Encyclopedia Under the Gaze of the Sages: A Systematic Review of Scholarly Research on Wikipedia , 2012 .

[14]  Finn Årup Nielsen,et al.  Wikipedia research and tools: Review and comments , 2012 .

[15]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[16]  David R. Musicant,et al.  Getting to the source: where does Wikipedia get its information from? , 2013, OpenSym.

[17]  Yochai Benkler,et al.  Cooperation in a Peer Production Economy Experimental Evidence from Wikipedia. , 2013 .

[18]  Mark Graham,et al.  Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty , 2014 .

[19]  Scott A. Hale Multilinguals and Wikipedia editing , 2013, WebSci '14.

[20]  David Bamman,et al.  Unsupervised Discovery of Biographical Structure from Text , 2014, TACL.

[21]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[22]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[23]  David García,et al.  It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia , 2015, ICWSM.

[24]  Aaron D. Shaw,et al.  Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia , 2015 .

[25]  Loren G. Terveen,et al.  Misalignment Between Supply and Demand of Quality Content in Peer Production Communities , 2021, ICWSM.

[26]  Mounia Lalmas,et al.  First Women, Second Sex: Gender Bias in Wikipedia , 2015, HT.

[27]  Amanda Menking,et al.  The Heart Work of Wikipedia: Gendered, Emotional Labor in the World's Largest Online Encyclopedia , 2015, CHI.

[28]  Oded Nov,et al.  Functional Roles and Career Paths in Wikipedia , 2015, CSCW.

[29]  Finn Årup Nielsen,et al.  “The sum of all human knowledge”: A systematic review of scholarly research on the content of Wikipedia , 2015, J. Assoc. Inf. Sci. Technol..

[30]  Marit Hinnosaar,et al.  Gender Inequality in New Media: Evidence from Wikipedia , 2019, Journal of Economic Behavior & Organization.

[31]  Michael S. Evans,et al.  Measuring Verifiability in Online Information , 2015, ArXiv.

[32]  Dima Shepelyansky,et al.  Interactions of Cultures and Top People of Wikipedia from Ranking of 24 Language Editions , 2014, PloS one.

[33]  Jure Leskovec,et al.  Growing Wikipedia Across Languages via Recommendation , 2016, WWW.

[34]  Eduardo Graells-Garrido,et al.  Women through the glass ceiling: gender asymmetries in Wikipedia , 2016, EPJ Data Science.

[35]  Oded Nov,et al.  Motivational Determinants of Participation Trajectories in Wikipedia , 2016, ICWSM.

[36]  Gerald C. Kane,et al.  It's Not What You Think: Gender Bias in Information about Fortune 1000 CEOs on Wikipedia , 2016, ICIS.

[37]  Jahna Otterbacher,et al.  Similar Gaps, Different Origins? Women Readers and Editors at Greek Wikipedia , 2016, Wiki@ICWSM.

[38]  Aaron Halfaker,et al.  Who Did What: Editor Role Identification in Wikipedia , 2021, ICWSM.

[39]  Aaron Halfaker,et al.  Not at Home on the Range: Peer Production and the Urban/Rural Divide , 2016, CHI.

[40]  Maya Tamir Why Do People Regulate Their Emotions? A Taxonomy of Motives in Emotion Regulation , 2016, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[41]  Anamika Chhabra,et al.  Should Wikipedia and Quora collaborate? , 2016, 2016 8th International Conference on Communication Systems and Networks (COMSNETS).

[42]  Aaron D. Shaw,et al.  The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users , 2017, CSCW.

[43]  Aaron L Halfaker Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect , 2017, OpenSym.

[44]  Avishek Anand,et al.  Fine Grained Citation Span for References in Wikipedia , 2017, EMNLP.

[45]  Oded Nov,et al.  It was Fun, but Did it Last? , 2017, Proc. ACM Hum. Comput. Interact..

[46]  Jure Leskovec,et al.  Why We Read Wikipedia , 2017, WWW.

[47]  Thomas Shafee,et al.  Evolution of Wikipedia’s medical content: past, present and future , 2017, Journal of Epidemiology & Community Health.

[48]  David W. McDonald,et al.  Who Wants to Read This?: A Method for Measuring Topical Representativeness in User Generated Content Systems , 2017, CSCW.

[49]  Oded Nov,et al.  On the "How" and "Why" of Emergent Role Behaviors in Wikipedia , 2017, CSCW.

[50]  Fabian Flöck,et al.  "(Weitergeleitet von Journalistin)": The Gendered Presentation of Professions on Wikipedia , 2017, WebSci.

[51]  Brent J. Hecht,et al.  The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies , 2017, ICWSM.

[52]  Kristofer Erickson,et al.  What is the Commons Worth?: Estimating the Value of Wikimedia Imagery by Observing Downstream Use , 2018, OpenSym.

[53]  Jacob Eisenstein,et al.  Mind Your POV: Convergence of Articles and Editors Towards Wikipedia's Neutrality Norm , 2018, Proc. ACM Hum. Comput. Interact..

[54]  David Laniado,et al.  Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions , 2018, Front. Phys..

[55]  Besnik Fetahu,et al.  Detecting Biased Statements in Wikipedia , 2018, WWW.

[56]  Aaron D. Shaw,et al.  The Pipeline of Online Participation Inequalities: The Case of Wikipedia Editing , 2018 .

[57]  Brent J. Hecht,et al.  Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities , 2018, CHI.

[58]  Brian Keegan,et al.  The Dynamics of Peer-Produced Political Information During the 2016 U.S. Presidential Campaign , 2018, ICWSM.

[59]  Maximilian Klein,et al.  Gender gap through time and space: A journey through Wikipedia biographies via the Wikidata Human Gender Indicator , 2018, New Media Soc..

[60]  Brent J. Hecht,et al.  The_Tower_of_Babel.jpg: Diversity of Visual Encyclopedic Knowledge Across Wikipedia Language Editions , 2018, ICWSM.

[61]  Aaron Halfaker,et al.  Evaluating the impact of the Wikipedia Teahouse on newcomer socialization and retention , 2018, OpenSym.

[62]  Annabel Rothschild,et al.  How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources , 2018 .

[63]  H. Brückner,et al.  Who Counts as a Notable Sociologist on Wikipedia? Gender, Race, and the “Professor Test” , 2019, Socius: Sociological Research for a Dynamic World.

[64]  Dario Taraborelli,et al.  Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability , 2019, WWW.

[65]  James Heilman,et al.  Readability of English Wikipedia's health information over time , 2019, WikiJournal of Medicine.

[66]  David Laniado,et al.  Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions , 2019, ICWSM.

[67]  Florian Lemmerich,et al.  Why the World Reads Wikipedia: Beyond English Speakers , 2018, WSDM.

[68]  Brent J. Hecht,et al.  Measuring the Importance of User-Generated Content to Search Engines , 2019, ICWSM.

[69]  Witold Abramowicz,et al.  Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics , 2019, Comput..

[70]  Feng Shi,et al.  The wisdom of polarized crowds , 2017, Nature Human Behaviour.

[71]  Matthias Hagen,et al.  Wikipedia Text Reuse: Within and Without , 2018, ECIR.

[72]  Taha Yasseri,et al.  Female scholars need to achieve more for equal public recognition , 2019, ArXiv.

[73]  Pablo Beytía The Positioning Matters: Estimating Geographical Bias in the Multilingual Record of Biographies on Wikipedia , 2020, WWW.

[74]  Brian Keegan,et al.  Visual Narratives and Collective Memory across Peer-Produced Accounts of Contested Sociopolitical Events , 2020, ACM Trans. Soc. Comput..

[75]  Denny Vrandevci'c,et al.  Architecture for a multilingual Wikipedia , 2020, ArXiv.

[76]  Denise A Smith Situating Wikipedia as a health information resource in various contexts: A scoping review , 2020, PloS one.

[77]  Markus Strohmaier,et al.  Global gender differences in Wikipedia readership , 2020, ICWSM.