Exploring and Visualizing Differences in Geographic and Linguistic Web Coverage

This article reports on a study performed to understand the geographic and linguistic coverage of web resources, focusing on the example of tourism-related themes in Switzerland. Search engine queries of web documents were used to gather counts for phrases in four different languages. The study focused on selected populated places and tourist attractions in Switzerland from three gazetteer datasets: topographic gazetteer data from the Swiss national mapping agency (SwissTopo); POI data from a commercial data provider (Tele Atlas) and user generated geographic content (geonames.org). The web counts illustrate the geographic extent and trends of web coverage of tourism for different languages. Results show that coverage for local languages, i.e. German, French and Italian, is more strongly related to the region of the spoken language. Correlation of the web counts to typical tourism indicators, e.g. population and number of hotel nights rented per year, are also computed and compared.

[1]  Adrian Popescu,et al.  Deducing trip related information from flickr , 2009, WWW '09.

[2]  Jochen L. Leidner,et al.  Detecting geographical references in the form of place names and associated spatial natural language , 2011, SIGSPACIAL.

[3]  M. Goodchild,et al.  Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr , 2013 .

[4]  Philip David Smart,et al.  Mining the web to detect place names , 2008, GIR '08.

[5]  Saral Jain,et al.  Antourage: mining distance-constrained trips from flickr , 2010, WWW '10.

[6]  Hideo Joho,et al.  Deliverable type: Contributing WP: , 2022 .

[7]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[8]  Paolo Rosso,et al.  A conceptual density‐based approach for the disambiguation of toponyms , 2008, Int. J. Geogr. Inf. Sci..

[9]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[10]  Paul D. Clough,et al.  Mapping geographic coverage of the web , 2008, GIS '08.

[11]  Darren Gergle,et al.  The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context , 2010, CHI.

[12]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[13]  J. Ord,et al.  Local Spatial Autocorrelation Statistics: Distributional Issues and an Application , 2010 .

[14]  Nell Leiper,et al.  The framework of tourism: Towards a definition of tourism, tourist, and the tourist industry , 1979 .

[15]  Robert Weibel,et al.  Real-time generalization of point data in mobile and web mapping using quadtrees , 2013 .

[16]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[17]  Scott A. Hale,et al.  Featured Graphic. Mapping the Geoweb: A Geography of Twitter , 2013 .

[18]  Christopher B. Jones,et al.  Geographical information retrieval , 2008, Int. J. Geogr. Inf. Sci..

[19]  Davide Buscaldi,et al.  Approaches to disambiguating toponyms , 2011, SIGSPACIAL.

[20]  Stefan M. Rüger,et al.  Using co‐occurrence models for placename disambiguation , 2008, Int. J. Geogr. Inf. Sci..

[21]  Konstantinos Koumpis,et al.  Automatic summarization of voicemail messages using lexical and prosodic features , 2005, TSLP.

[22]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[23]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[24]  Scott A. Hale,et al.  Featured Graphic: Digital Divide: The Geography of Internet Access , 2012 .

[25]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[26]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[27]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[28]  Darren Gergle,et al.  On the "localness" of user-generated content , 2010, CSCW '10.