Growing Wikipedia Across Languages via Recommendation

The different Wikipedia language editions vary dramatically in how comprehensive they are. As a result, most language editions contain only a small fraction of the sum of information that exists across all Wikipedias. In this paper, we present an approach to filling gaps in article coverage across different Wikipedia editions. Our main contribution is an end-to-end system for recommending articles for creation that exist in one language but are missing in an- other. The system involves identifying missing articles, ranking the missing articles according to their importance, and recommending important missing articles to editors based on their interests. We empirically validate our models in a controlled experiment involving 12,000 French Wikipedia editors. We find that personalizing recommendations increases editor engagement by a factor of two. Moreover, recommending articles increases their chance of being created by a factor of 3.2. Finally, articles created as a result of our recommendations are of comparable quality to organically created articles. Overall, our system leads to more engaged editors and faster growth of Wikipedia with no effect on its quality.

[1]  Darren Gergle,et al.  The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context , 2010, CHI.

[2]  Elena Filatova Multilingual Wikipedia , Summarization , and Information Trustworthiness , 2009 .

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  John Riedl,et al.  Tell me more: an actionable quality model for Wikipedia , 2013, OpenSym.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Michael Skinner,et al.  Information arbitrage across multi-lingual Wikipedia , 2009, WSDM '09.

[7]  Patrick Adams,et al.  Bridging the language divide in health , 2015, Bulletin of the World Health Organization.

[8]  Michael S. Horn,et al.  Omnipedia: bridging the wikipedia language gap , 2012, CHI.

[9]  Martin Schader,et al.  Personalized task recommendation in crowdsourcing information systems - Current state of the art , 2014, Decis. Support Syst..

[10]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[11]  Scott A. Hale Multilinguals and Wikipedia editing , 2013, WebSci '14.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  John Riedl,et al.  SuggestBot: using intelligent task routing to help people find work in wikipedia , 2007, IUI '07.

[14]  Kevin Duh,et al.  Providing Cross-Lingual Editing Assistance to Wikipedia Editors , 2011, CICLing.

[15]  Jack Edmonds,et al.  Maximum matching and a polyhedron with 0,1-vertices , 1965 .

[16]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[17]  Feng Niu,et al.  Building an Entity-Centric Stream Filtering Test Collection for TREC 2012 , 2012, TREC.