The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create "culturally-aware applications" and "hyperlingual applications".

[1]  Aniket Kittur,et al.  Harnessing the wisdom of crowds in wikipedia: quality through coordination , 2008, CSCW.

[2]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[3]  Lukasz Bolikowski Scale-free topology of the interlanguage links in Wikipedia , 2009 .

[4]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[5]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[6]  Karrie Karahalios,et al.  Conversation clusters: grouping conversation topics through human-computer dialog , 2009, CHI.

[7]  Michael J. Muller Comparing tagging vocabularies among four enterprise tag-based services , 2007, GROUP.

[8]  Takahiro Hara,et al.  A Bilingual Dictionary Extracted from the Wikipedia Link Structure , 2008, DASFAA.

[9]  Susan C. Herring,et al.  Cultural bias in Wikipedia content on famous persons , 2011, J. Assoc. Inf. Sci. Technol..

[10]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .

[11]  Michael Skinner,et al.  Information arbitrage across multi-lingual Wikipedia , 2009, WSDM '09.

[12]  Andrew Lih,et al.  The Wikipedia revolution : how a bunch of nobodies created the world's greatest encyclopedia , 2009 .

[13]  Jesús M. González-Barahona,et al.  On the Inequality of Contributions to Wikipedia , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[14]  Rada Mihalcea,et al.  Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[15]  Patricia R. Ladd,et al.  The Wikipedia revolution : how a bunch of nobodies created the world's greatest encyclopedia , 2009 .

[16]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[17]  Evgeniy Gabrilovich,et al.  Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[18]  Jong-Hoon Oh,et al.  Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[19]  Benno Stein,et al.  A Wikipedia-Based Multilingual Retrieval Model , 2008, ECIR.

[20]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[21]  Hideaki Kuzuoka,et al.  Difficulties in establishing common ground in multiparty groups using machine translation , 2009, CHI.

[22]  Steffen Staab,et al.  Explicit Versus Latent Concept Models for Cross-Language Information Retrieval , 2009, IJCAI.

[23]  John Riedl,et al.  Creating, destroying, and restoring value in wikipedia , 2007, GROUP.

[24]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[25]  Jussi Karlgren Proceedings of the workshop on New Text: Wikis and blogs and other dynamic text sources , 2006 .

[26]  Bryan A. Pendleton,et al.  Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie , 2006 .

[27]  Martin Raubal,et al.  GeoSR: Geographically Explore Semantic Relations in World Knowledge , 2008, AGILE Conf..

[28]  Darren Gergle,et al.  Measuring self-focus bias in community-maintained knowledge repositories , 2009, C&T.

[29]  Robert E. Kraut,et al.  Mopping up: modeling wikipedia promotion decisions , 2008, CSCW.

[30]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[31]  James Fogarty,et al.  Intelligence in Wikipedia , 2008, AAAI.

[32]  Philipp Cimiano,et al.  Enriching the crosslingual link structure of Wikipedia - A classification-based approach , 2008, AAAI 2008.