Analyzing Relatedness by Toponym Co‐Occurrences on Web Pages

This research proposes a method for capturing “relatedness between geographical entities” based on the co-occurrences of their names on web pages. The basic assumption is that a higher count of co-occurrences of two geographical places implies a stronger relatedness between them. The spatial structure of China at the provincial level is explored from the co-occurrences of two provincial units in one document, extracted by a web information retrieval engine. Analysis on the co-occurrences and topological distances between all pairs of provinces indicates that: (1) spatially close provinces generally have similar co-occurrence patterns; (2) the frequency of co-occurrences exhibits a power law distance decay effect with the exponent of 0.2; and (3) the co-occurrence matrix can be used to capture the similarity/linkage between neighboring provinces and fed into a regionalization method to examine the spatial organization of China. The proposed method provides a promising approach to extracting valuable geographical information from massive web pages.

[1]  Waldo R. Tobler,et al.  Experiments In Migration Mapping By Computer , 1987 .

[2]  Hideo Joho,et al.  Deliverable type: Contributing WP: , 2022 .

[3]  Emily Moxley,et al.  Terabytes of Tobler: Evaluating the First Law in a Massive, Domain-Neutral Representation of World Knowledge , 2009, COSIT.

[4]  S. Murnion,et al.  Modeling distance decay effects in Web server information flows , 2010 .

[5]  Michael F. Goodchild,et al.  Positioning localities based on spatial assertions , 2009, Int. J. Geogr. Inf. Sci..

[6]  Jean-Claude Thill,et al.  Visual Data Mining in Spatial Interaction Analysis with Self-Organizing Maps , 2009 .

[7]  Michael F. Worboys,et al.  Nearness relations in environmental space , 2001, Int. J. Geogr. Inf. Sci..

[8]  Fahui Wang,et al.  Reconstructing Gravitational Attractions of Major Cities in China from Air Passenger Flow Data, 2001–2008: A Particle Swarm Optimization Approach , 2013 .

[9]  Michael F. Goodchild,et al.  Introduction to digital gazetteer research , 2008, Int. J. Geogr. Inf. Sci..

[10]  Ken Barker,et al.  Extraction of geospatial information on the Web for GIS applications , 2011, IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing (ICCI-CC'11).

[11]  Peter van Oosterom,et al.  Computers, Environment and Urban Systems , 2009 .

[12]  Guoray Cai,et al.  Contextualization of Geospatial Database Semantics for Mediating Human-GIS Dialogues , 2005 .

[13]  Yong Gao,et al.  A semantic geographical knowledge wiki system mashed up with Google Maps , 2010 .

[14]  Stefan M. Rüger,et al.  Using co‐occurrence models for placename disambiguation , 2008, Int. J. Geogr. Inf. Sci..

[15]  Fahui Wang,et al.  Sinification of Zhuang place names in Guangxi, China: a GIS‐based spatial analysis approach , 2012 .

[16]  M. O'Kelly,et al.  New Estimates of Gravitational Attraction by Linear Programming , 2010 .

[17]  Guoray Cai,et al.  Contextualization of Geospatial Database Semantics for Human–GIS Interaction , 2007, GeoInformatica.

[18]  Claudia Bauzer Medeiros,et al.  The Web as a Data Source for Spatial Databases , 2003, GeoInfo.

[19]  Rajendra Kulkarni,et al.  Spatial Small Worlds: New Geographic Patterns for an Information Economy , 2003 .

[20]  Diansheng Guo,et al.  Flow Mapping and Multivariate Visualization of Large Spatial Interaction Data , 2009, IEEE Transactions on Visualization and Computer Graphics.

[21]  H. Miller Tobler's First Law and Spatial Analysis , 2004 .

[22]  Alasdair Rae Flow-Data Analysis with Geographical Information Systems: A Visual Approach , 2011 .

[23]  Morton E. O'Kelly,et al.  Spatial Interaction Models:Formulations and Applications , 1988 .

[24]  Andrew Hudson-Smith,et al.  Map mashups, Web 2.0 and the GIS revolution , 2010, Ann. GIS.

[25]  Percy M. Roxby,et al.  The Distribution of Population in China: Economic and Political Significance , 1925 .

[26]  Yu Liu,et al.  A point-set-based approximation for areal objects: A case study of representing localities , 2010, Comput. Environ. Urban Syst..

[27]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[28]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[29]  Diansheng Guo,et al.  Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP) , 2008, Int. J. Geogr. Inf. Sci..

[30]  T. Geisel,et al.  The scaling laws of human travel , 2006, Nature.

[31]  Cláudio de Souza Baptista,et al.  A Model for Geographic Knowledge Extraction on Web Documents , 2009, ER Workshops.

[32]  Yong Gao,et al.  A common sense geographic knowledge base for GIR , 2008 .

[33]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[34]  Jasmine Novak,et al.  Geographic routing in social networks , 2005, Proc. Natl. Acad. Sci. USA.

[35]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  S. Strogatz,et al.  Redrawing the Map of Great Britain from a Network of Human Interactions , 2010, PloS one.

[37]  Hawoong Jeong,et al.  Googling Social Interactions: Web Search Engine Based Social Network Construction , 2007, PloS one.

[38]  Daniel Z. Sui,et al.  The wikification of GIS and its consequences: Or Angelina Jolie's new tattoo and the future of GIS , 2008, Comput. Environ. Urban Syst..

[39]  Alasdair Rae,et al.  From spatial interaction data to spatial interaction information? Geovisualisation and spatial structures of migration from the 2001 UK census , 2009, Comput. Environ. Urban Syst..

[40]  Harith Alani,et al.  Geographical Information Retrieval with Ontologies of Place , 2001, COSIT.

[41]  Diansheng Guo,et al.  Constructing Geographic Areas for Cancer Data Analysis: A Case Study on Late-stage Breast Cancer Risk in Illinois. , 2012, Applied geography.

[42]  Fahui Wang,et al.  A Scale-Space Clustering Method: Mitigating the Effect of Scale in the Analysis of Zone-Based Data , 2008 .

[43]  Luis Gravano,et al.  Exploiting Geographical Location Information of Web Pages , 1999, WebDB.

[44]  Fahui Wang,et al.  Terrain characteristics and Tai toponyms: a GIS analysis of Muang, Chiang and Viang , 2010 .

[45]  Min Chen,et al.  An evidence-based approach for Toponym Disambiguation , 2010, 2010 18th International Conference on Geoinformatics.

[46]  Jian Ma,et al.  A neural netwok based approach to detect influenza epidemics using search engine query data , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[47]  Renaud Lambiotte,et al.  Uncovering space-independent communities in spatial networks , 2010, Proceedings of the National Academy of Sciences.

[48]  Jukka-Pekka Onnela,et al.  Geographic Constraints on Social Network Groups , 2010, PloS one.

[49]  Donna Peuquet,et al.  A conceptual framework for incorporating cognitive principles into geographical database representation , 2000, Int. J. Geogr. Inf. Sci..

[50]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[51]  Jean-Claude Thill,et al.  How Far Is Too Far? - A Statistical Approach to Context-contingent Proximity Modeling , 2005, Trans. GIS.

[52]  Christopher B. Jones,et al.  Geographical information retrieval , 2008, Int. J. Geogr. Inf. Sci..