Estimating Linguistic Diversity on the Internet: A Taxonomy to Avoid Pitfalls and Paradoxes

Both UNESCO and OECD have recognized the public policy benefit of publicizing information on linguistic diversity on the Internet. However, the published methodologies for estimating “linguistic diversity” or “Internet statistics (by language)” do so with different interpretations of these key terms. This article creates a new taxonomy, defining and contrasting user activity, user profile, web presence, and diversity index to distinguish among the various indicators used to estimate language usage on the Internet. This taxonomy facilitates comparisons of the available methodologies, whose limitations are then critiqued. It also helps to resolve the apparent paradox as to whether the use of English on the Internet has declined rapidly or has remained fairly stable. The study concludes that the best estimates of web presence can be achieved by direct measurement: randomly addressing and analyzing a representative sample of all public websites. However, this approach will only suffice if the language detection software used is progressively extended to recognize all the world’s written languages.

[1]  Kenneth Katzner,et al.  Languages of the World , 1977 .

[2]  Xavier Gómez Guinovart A lingua galega en Internet , 2003 .

[3]  E. O'Neill,et al.  How “World Wide” Is the Web? , 2001 .

[4]  Mercedes Durham,et al.  Language Choice on a Swiss Mailing List , 2006, J. Comput. Mediat. Commun..

[5]  Rick Bennett,et al.  Trends in the Evolution of the Public Web: 1998 - 2002 , 2003, D Lib Mag..

[6]  Elizabeth Van Couvering,et al.  Is Relevance Relevant? Market, Science, and War: Discourses of Search Engine Quality , 2007, J. Comput. Mediat. Commun..

[7]  David Crystal Language and the Internet: The language of virtual worlds , 2006 .

[8]  David Crystal,et al.  Language and the Internet , 2001 .

[9]  John C. Paolillo,et al.  Measuring linguistic diversity on the internet , 2005 .

[10]  J. Morsink,et al.  The Universal Declaration of Human Rights: Origins, Drafting, and Intent , 1999 .

[11]  Antoni Oliver,et al.  Bilingual Newsgroups in Catalonia: A Challenge for Machine Translation , 2006, J. Comput. Mediat. Commun..

[12]  Ruth Wodak,et al.  The European Union in Cyberspace. Multilingual Democratic Participation in a virtual public sphere , 2006 .

[13]  John C. Paolillo,et al.  Evaluating Language Statistics: The Ethnologue and Beyond A report prepared for the UNESCO Institute for Statistics , 2006 .

[14]  J. Fishman Bilingualism with and without diglossia; diglossia with and without bilingualism , 1967, The Bilingualism Reader.

[15]  Virach Sornlertlamvanich,et al.  Language diversity on the internet: an asian view , 2005 .