Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model

ABSTRACT Urban land use information plays an essential role in a wide variety of urban planning and environmental monitoring processes. During the past few decades, with the rapid technological development of remote sensing (RS), geographic information systems (GIS) and geospatial big data, numerous methods have been developed to identify urban land use at a fine scale. Points-of-interest (POIs) have been widely used to extract information pertaining to urban land use types and functional zones. However, it is difficult to quantify the relationship between spatial distributions of POIs and regional land use types due to a lack of reliable models. Previous methods may ignore abundant spatial features that can be extracted from POIs. In this study, we establish an innovative framework that detects urban land use distributions at the scale of traffic analysis zones (TAZs) by integrating Baidu POIs and a Word2Vec model. This framework was implemented using a Google open-source model of a deep-learning language in 2013. First, data for the Pearl River Delta (PRD) are transformed into a TAZ-POI corpus using a greedy algorithm by considering the spatial distributions of TAZs and inner POIs. Then, high-dimensional characteristic vectors of POIs and TAZs are extracted using the Word2Vec model. Finally, to validate the reliability of the POI/TAZ vectors, we implement a K-Means-based clustering model to analyze correlations between the POI/TAZ vectors and deploy TAZ vectors to identify urban land use types using a random forest algorithm (RFA) model. Compared with some state-of-the-art probabilistic topic models (PTMs), the proposed method can efficiently obtain the highest accuracy (OA = 0.8728, kappa = 0.8399). Moreover, the results can be used to help urban planners to monitor dynamic urban land use and evaluate the impact of urban planning schemes.

[1]  Stéphane Dupuy,et al.  Land-cover dynamics in Southeast Asia: Contribution of object-oriented techniques for change detection , 2012 .

[2]  Jean-Philippe Vert,et al.  A bagging SVM to learn from positive and unlabeled examples , 2010, Pattern Recognit. Lett..

[3]  Eric F. Lambin,et al.  Land-Use and Land-Cover Change , 2006 .

[4]  Yue Lu,et al.  Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA , 2011, Information Retrieval.

[5]  Thomas Blaschke,et al.  Object based image analysis for remote sensing , 2010 .

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Zili Zhu,et al.  Real options analysis for land use management: Methods, application, and implications for policy. , 2015, Journal of environmental management.

[8]  Yu Li,et al.  Automatic Target Detection in High-Resolution Remote Sensing Images Using Spatial Sparse Coding Bag-of-Words Model , 2012, IEEE Geoscience and Remote Sensing Letters.

[9]  Hua Xu,et al.  Chinese comments sentiment classification based on word2vec and SVMperf , 2015, Expert Syst. Appl..

[10]  Piotr Tokarczyk,et al.  Features, Color Spaces, and Boosting: New Insights on Semantic Classification of Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Shihong Du,et al.  A Linear Dirichlet Mixture Model for decomposing scenes: Application to analyzing urban functional zonings , 2015 .

[13]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[14]  Leslie Rutkowski Clustering for data mining: A data recovery approach , 2007 .

[15]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[16]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[17]  林 良嗣,et al.  Transport, land-use and the environment , 1996 .

[18]  Abbas Rajabifard,et al.  Land Administration for Sustainable Development , 2010 .

[19]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[20]  Alexander Zipf,et al.  Toward mapping land-use patterns from volunteered geographic information , 2013, Int. J. Geogr. Inf. Sci..

[21]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[22]  Ping Jian,et al.  Semantic Annotation of High-Resolution Remote Sensing Images via Gaussian Process Multi-Instance Multilabel Learning , 2013, IEEE Geoscience and Remote Sensing Letters.

[23]  Xing Xie,et al.  Discovering regions of different functions in a city using human mobility and POIs , 2012, KDD.

[24]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Shougeng Hu,et al.  Automated urban land-use classification with remote sensing , 2013 .

[27]  Xingjian Liu,et al.  Automated identification and characterization of parcels (AICP) with OpenStreetMap and Points of Interest , 2013, ArXiv.

[28]  Jianping Wu,et al.  Monitoring urban expansion and land use/land cover changes of Shanghai metropolitan area during the transitional economy (1979–2009) in China , 2011, Environmental monitoring and assessment.

[29]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[30]  Filipe Rodrigues,et al.  Automatic Classification of Points-of-Interest for Land-use Analysis , 2012 .

[31]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[32]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[33]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[35]  Liangpei Zhang,et al.  Hybrid generative/discriminative scene classification strategy based on latent dirichlet allocation for high spatial resolution remote sensing imagery , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[36]  D. L. Rosa,et al.  Characterization of non-urbanized areas for land-use planning of agricultural and green infrastructure in urban contexts , 2013 .

[37]  Chaogui Kang,et al.  Social Sensing: A New Approach to Understanding Our Socioeconomic Environments , 2015 .

[38]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[39]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[40]  Daniel Neagu,et al.  Interpreting random forest classification models using a feature contribution method , 2013, IRI.

[41]  Hwee Tou Ng,et al.  Corpus-Based Approaches to Semantic Interpretation in NLP , 1997, AI Mag..

[42]  Liangpei Zhang,et al.  An SVM Ensemble Approach Combining Spectral, Structural, and Semantic Features for the Classification of High-Resolution Remotely Sensed Imagery , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Xiaoping Liu,et al.  Simulating urban growth by integrating landscape expansion index (LEI) and cellular automata , 2014, Int. J. Geogr. Inf. Sci..

[44]  Francisco C. Pereira,et al.  Mining point-of-interest data from social networks for urban land use classification and disaggregation , 2015, Comput. Environ. Urban Syst..

[45]  Farshad Fotouhi,et al.  Bias and stability of single variable classifiers for feature ranking and selection , 2014, Expert Syst. Appl..

[46]  Yizhen Gu,et al.  Spatiotemporal heterogeneity of urban planning implementation effectiveness: Evidence from five urban master plans of Beijing , 2012 .

[47]  Mihai Datcu,et al.  Bridging the Semantic Gap for Satellite Image Annotation and Automatic Mapping Applications , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[48]  Jon Atli Benediktsson,et al.  A Novel Automatic Change Detection Method for Urban High-Resolution Remotely Sensed Imagery Based on Multiindex Scene Representation , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[49]  Thomas Blaschke,et al.  Geographic Object-Based Image Analysis – Towards a new paradigm , 2014, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[50]  Jungho Im,et al.  Support vector machines in remote sensing: A review , 2011 .

[51]  Tiyan Shen,et al.  Evaluation of plan implementation in the transitional China: A case of Guangzhou city master plan , 2011 .

[52]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[53]  Bin Jiang,et al.  Geospatial analysis and modelling of urban structure and dynamics , 2010 .

[54]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .