Whose article is it anyway? – Detecting authorship distribution in Wikipedia articles over time with WIKIGINI

In this work, we present a novel approach to detecting authorship of words in Wikipedia, which outperforms the baseline method in terms of accuracy. This is achieved by reducing the necessary word-based text-to-text comparisons, which are the most fallible steps in the process. To provide an aggregated measure of the concentration, we calculate a gini coefficient for each revision of an article based on our word-author-assignments. As a motivation for calculating this measure we argue that the concentration of words to just a few authors can be an indicator for a lack of quality and neutrality in an article. The coefficient development over time in an article is visualized and provided online as an easily accessible and useful tool to investigate how the content of an article evolved. We present examples where the gini curve gives useful insights into differences of articles and may help to spot crucial events in the past evolution of an article.