From ITDL to Place2Vec: Reasoning About Place Type Similarity and Relatedness by Learning Embeddings From Augmented Spatial Contexts

Understanding, representing, and reasoning about Points Of Interest (POI) types such as Auto Repair, Body Shop, Gas Stations, or Planetarium, is a key aspect of geographic information retrieval, recommender systems, geographic knowledge graphs, as well as studying urban spaces in general, e.g., for extracting functional or vague cognitive regions from user-generated content. One prerequisite to these tasks is the ability to capture the similarity and relatedness between POI types. Intuitively, a spatial search that returns body shops or even gas stations in the absence of auto repair places is still likely to satisfy some user needs while returning planetariums will not. Place hierarchies are frequently used for query expansion, but most of the existing hierarchies are relatively shallow and structured from a single perspective, thereby putting POI types that may be closely related regarding some characteristics far apart from another. This leads to the question of how to learn POI type representations from data. Models such as Word2Vec that produces word embeddings from linguistic contexts are a novel and promising approach as they come with an intuitive notion of similarity. However, the structure of geographic space, e.g., the interactions between POI types, differs substantially from linguistics. In this work, we present a novel method to augment the spatial contexts of POI types using a distance-binned, information-theoretic approach to generate embeddings. We demonstrate that our work outperforms Word2Vec and other models using three different evaluation tasks and strongly correlates with human assessments of POI type similarity. We published the resulting embeddings for 570 place types as well as a collection of human similarity assessments online for others to use.

[1]  Krzysztof Janowicz,et al.  Where is also about time: A location-distortion model to improve reverse geocoding using behavior-driven temporal semantic signatures , 2015, Comput. Environ. Urban Syst..

[2]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[3]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[4]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[5]  Krzysztof Janowicz,et al.  The semantics of similarity in geographic information retrieval , 2011 .

[6]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[7]  Krzysztof Janowicz,et al.  How where is when? On the regional variability and resolution of geosocial temporal signatures for points of interest , 2015, Comput. Environ. Urban Syst..

[8]  Sylvie Ranwez,et al.  Semantic Similarity from Natural Language and Ontology Analysis , 2015, Synthesis Lectures on Human Language Technologies.

[9]  Krzysztof Janowicz,et al.  Spatial signatures for geographic feature types: examining gazetteer ontologies using spatial statistics , 2016, Trans. GIS.

[10]  Krzysztof Janowicz,et al.  Observation‐Driven Geo‐Ontology Engineering , 2012, Trans. GIS.

[11]  Y. Tuan,et al.  Space and Place: The Perspective of Experience. , 1978 .

[12]  Adam Jatowt,et al.  Is Tofu the Cheese of Asia?: Searching for Corresponding Objects across Geographical Areas , 2017, WWW.

[13]  Stephan Winter,et al.  Similarity matching for integrating spatial information extracted from place descriptions , 2017, Int. J. Geogr. Inf. Sci..

[14]  Shaowen Wang,et al.  Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning , 2017, WWW.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Bo An,et al.  POI2Vec: Geographical Latent Representation for Predicting Future Visitors , 2017, AAAI.

[17]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[18]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[19]  Krzysztof Janowicz,et al.  Analyzing the Spatial-Semantic Interaction of Points of Interest in Volunteered Geographic Information , 2011, COSIT.

[20]  Krzysztof Janowicz,et al.  What you are is when you are: the temporal dimension of feature types in location-based social networks , 2011, GIS.

[21]  Michael R. Lyu,et al.  Geo-Teaser: Geo-Temporal Sequential Embedding Rank for Point-of-interest Recommendation , 2016, WWW.

[22]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[23]  S. Harnad To Cognize is to Categorize: Cognition is Categorization , 2005 .

[24]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[27]  V. Batagelj Problems and Projects , 2005 .

[28]  Chris Callison-Burch,et al.  The Language of Place: Semantic Value from Geospatial Context , 2017, EACL.

[29]  Xiaoping Liu,et al.  Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model , 2017, Int. J. Geogr. Inf. Sci..

[30]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[31]  Krzysztof Janowicz,et al.  Thematic signatures for cleansing and enriching place-related linked data , 2015, Int. J. Geogr. Inf. Sci..

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[34]  Hanan Samet,et al.  Uncovering the spatial relatedness in Wikipedia , 2014, SIGSPATIAL/GIS.