Relating Wikipedia article quality to edit behavior and link structure

Currently, the relation between edit behavior, link structure, and article quality is not well-understood in our community, notwithstanding that this relationship may facilitate editing processes and content quality on Wikipedia. To shed light on this complex relation, we classify article edits and perform an in-depth analysis of editing sequences for 4941 articles. Additionally, we build a network of internal Wikipedia hyperlinks between articles. Using this data, we compute parsimonious metrics to quantify editing and linking behavior. Our analysis unveils that conflicted articles differ substantially from others in almost all metrics, while we also detect slight trends for high-quality articles. With our network analysis we find evidence indicating that controversial and edit war articles frequently span structural holes in the Wikipedia network. Finally, in a prediction experiment we demonstrate the usefulness of edit behavior patterns and network properties in predicting conflict and article quality. With our work, we assist online collaboration communities, especially Wikipedia, in long-term improvement of content quality by offering valuable insights about the interplay of article quality, controversies and edit wars, editing behavior, and network properties via sequence-based edit and network-based article metrics.

[1]  Riccardo Bellazzi,et al.  Transcriptional Profiles of Mating-Responsive Genes from Testes and Male Accessory Glands of the Mediterranean Fruit Fly, Ceratitis capitata , 2012, PloS one.

[2]  Taha Yasseri,et al.  Value Production in a Collaborative Environment , 2012, Journal of Statistical Physics.

[3]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[4]  Denis Helic,et al.  On the Relation of Edit Behavior, Link Structure, and Article Quality on Wikipedia , 2019, COMPLEX NETWORKS.

[5]  Rahul Roy,et al.  Network analysis of user generated content quality in Wikipedia , 2013, Online Inf. Rev..

[6]  R. Burt Structural Holes versus Network Closure as Social Capital , 2001 .

[7]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[8]  Alessandro Bozzon,et al.  Asking the right question in collaborative q&a systems , 2014, HT.

[9]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[10]  Feng Shi,et al.  The wisdom of polarized crowds , 2017, Nature Human Behaviour.

[11]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[12]  Iryna Gurevych,et al.  A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles , 2012, COLING.

[13]  Denis Helic,et al.  The Role of Structural Information for Designing Navigational User Interfaces , 2015, HT.

[14]  András Kornai,et al.  Edit Wars in Wikipedia , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[15]  Markus Strohmaier,et al.  What Makes a Link Successful on Wikipedia? , 2016, WWW.

[16]  Paolo Ciuccarelli,et al.  Societal Controversies in Wikipedia Articles , 2015, CHI.

[17]  Michael Ghil,et al.  Statistical significance test for transition matrices of atmospheric Markov chains , 1990 .

[18]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[19]  Rada Mihalcea,et al.  Topic Identification Using Wikipedia Graph Centrality , 2009, NAACL.

[20]  Alessandro Lomi,et al.  The free encyclopedia that anyone can dispute: An analysis of the micro-structural dynamics of positive and negative relations in the production of contentious Wikipedia articles , 2020, Soc. Networks.

[21]  Alessandro Lomi,et al.  The Network Structure of Successful Collaboration in Wikipedia , 2019, HICSS.

[22]  Mark Graham,et al.  The most controversial topics in Wikipedia: A multilingual and geographical analysis , 2013, ArXiv.

[23]  András Kornai,et al.  Dynamics of Conflicts in Wikipedia , 2012, PloS one.

[24]  Iryna Gurevych,et al.  Mass Collaboration on the Web: Textual Content Analysis by Means of Natural Language Processing , 2016 .

[25]  Aniket Kittur,et al.  He says, she says: conflict and coordination in Wikipedia , 2007, CHI.

[26]  Jaap Kamps,et al.  Is Wikipedia link structure different? , 2009, WSDM '09.

[27]  Aaron Halfaker,et al.  Edit Categories and Editor Role Identification in Wikipedia , 2016, LREC.

[28]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Diego Garlaschelli,et al.  Patterns of link reciprocity in directed networks. , 2004, Physical review letters.

[30]  Daniel M. Romero,et al.  Network Structure, Efficiency, and Performance in WikiProjects , 2018, ICWSM.

[31]  Paolo Rosso,et al.  Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features , 2011, CICLing.

[32]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[33]  Maribel Acosta,et al.  TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia , 2017, ICWSM.

[34]  Cheng Gao,et al.  Evolution of Wikipedia's Category Structure , 2012, ArXiv.

[35]  Joao Carvalho,et al.  The dynamic nature of conflict in Wikipedia , 2014, ArXiv.

[36]  Ulrik Brandes,et al.  Network analysis of collaboration structure in Wikipedia , 2009, WWW '09.

[37]  Denis Helic,et al.  Evaluating and Improving Navigability of Wikipedia: A Comparative Study of Eight Language Editions , 2016, OpenSym.

[38]  Olivier Teste,et al.  Measuring article quality in Wikipedia using the collaboration network , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[39]  Benno Stein,et al.  Automatic Vandalism Detection in Wikipedia , 2008, ECIR.

[40]  Christos Faloutsos,et al.  CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[41]  Maria da Graça Campos Pimentel,et al.  How do metrics of link analysis correlate to quality, relevance and popularity in wikipedia? , 2013, WebMedia.

[42]  Jacob Eisenstein,et al.  You Can't Stay Here , 2017 .

[43]  Oded Nov,et al.  Technology-mediated contributions: editing behaviors among new wikipedians , 2012, CSCW.

[44]  Markus Strohmaier,et al.  (Don't) Mention the War: A Comparison of Wikipedia and Britannica Articles on National Histories , 2018, WWW.

[45]  Britannica attacks , 2006, Nature.

[46]  Iryna Gurevych,et al.  Automatically Classifying Edit Categories in Wikipedia Revisions , 2013, EMNLP.

[47]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[48]  Sudha Ram,et al.  Using big data and network analysis to understand Wikipedia article quality , 2018, Data Knowl. Eng..

[49]  Alberto Montresor,et al.  WikiLinkGraphs: A complete, longitudinal and multi-language dataset of the Wikipedia link networks , 2019, ICWSM.

[50]  Oded Nov,et al.  Information Quality in Wikipedia: The Effects of Group Composition and Task Conflict , 2011, J. Manag. Inf. Syst..

[51]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[52]  V. S. Subrahmanian,et al.  VEWS: A Wikipedia Vandal Early Warning System , 2015, KDD.

[53]  Ting Wang,et al.  Automatically Assessing Wikipedia Article Quality by Exploiting Article-Editor Networks , 2015, ECIR.

[54]  Aaron Halfaker,et al.  Identifying Semantic Edit Intentions from Revisions in Wikipedia , 2017, EMNLP.

[55]  Jacob Eisenstein,et al.  You Can't Stay Here , 2017, Proc. ACM Hum. Comput. Interact..