In this work, we present a novel approach to detecting authorship of words in Wikipedia, which outperforms the baseline method in terms of accuracy. This is achieved by reducing the necessary word-based text-to-text comparisons, which are the most fallible steps in the process. To provide an aggregated measure of the concentration, we calculate a gini coefficient for each revision of an article based on our word-author-assignments. As a motivation for calculating this measure we argue that the concentration of words to just a few authors can be an indicator for a lack of quality and neutrality in an article. The coefficient development over time in an article is visualized and provided online as an easily accessible and useful tool to investigate how the content of an article evolved. We present examples where the gini curve gives useful insights into differences of articles and may help to spot crucial events in the past evolution of an article.
[1]
Krishnendu Chatterjee,et al.
Assigning trust to Wikipedia content
,
2008,
Int. Sym. Wikis.
[2]
J. Gastwirth.
The Estimation of the Lorenz Curve and Gini Index
,
1972
.
[3]
Fabian Flöck,et al.
Towards a diversity-minded Wikipedia
,
2011,
WebSci '11.
[4]
Luca de Alfaro,et al.
A content-driven reputation system for the wikipedia
,
2007,
WWW '07.
[5]
Martin Wattenberg,et al.
Studying cooperation and conflict between authors with history flow visualizations
,
2004,
CHI.
[6]
Darrell D. E. Long,et al.
A linear time, constant space differencing algorithm
,
1997,
1997 IEEE International Performance, Computing and Communications Conference.