Using an Optimized Chinese Address Matching Method to Develop a Geocoding Service: A Case Study of Shenzhen, China

With the coming era of big data and the rapid development and widespread applications of Geographical Information Systems (GISs), geocoding technology is playing an increasingly important role in bridging the gap between non-spatial data resources and spatial data in various fields. However, Chinese geocoding faces great challenges because of the complexity of the address string format in Chinese, which contains no delimiters between Chinese words, and the poor address management resulting from the existence of multiple address authorities spread among different governmental agencies. This paper presents a geocoding service based on an optimized Chinese address matching method, including address modeling, address standardization and address matching. The address model focuses on the spatial semantics of each address element, and the address standardization process is based on an address tree model. A geocoding service application is implemented in practice using a large quantity of data from Shenzhen Municipality. More than 1,460,000 data records were used to test the geocoding service, and good matching rates were achieved with good adaptability and intelligence.

[1]  J. Ratcliffe Geocoding crime and a first estimate of a minimum acceptable hit rate , 2004, Int. J. Geogr. Inf. Sci..

[2]  Jeremy C. Weiss,et al.  Comparing a single-stage geocoding method to a multi-stage geocoding method: how much and where do they disagree? , 2007, International Journal of Health Geographics.

[3]  Jarvis T. Chen,et al.  Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter?: the Public Health Disparities Geocoding Project. , 2002, American journal of epidemiology.

[4]  Paul A. Zandbergen,et al.  A comparison of address point, parcel and street geocoding techniques , 2008, Comput. Environ. Urban Syst..

[5]  X. Shi Evaluating the uncertainty caused by Post Office Box addresses in environmental health studies: A restricted Monte Carlo approach , 2007, Int. J. Geogr. Inf. Sci..

[6]  Craig A. Knoblock,et al.  From Text to Geographic Coordinates: The Current State of Geocoding , 2007 .

[7]  H. Yu,et al.  Study on City Address Geocoding Model Based on Street , 2013 .

[8]  Jing Nie,et al.  Positional Accuracy of Geocoded Addresses in Epidemiologic Research , 2003, Epidemiology.

[9]  Yin Ji An Automatic Geocoding Algorithm Based on Address Segmentation , 2011 .

[10]  A. Buja,et al.  Geocoding health data with Geographic Information Systems: a pilot study in northeast Italy for developing a standardized data-acquiring format , 2015, Journal of preventive medicine and hygiene.

[11]  Qi Li,et al.  A knowledge-based agent prototype for Chinese address geocoding , 2008, Geoinformatics.

[12]  Frederico T. Fonseca,et al.  Assessing the Certainty of Locations Produced by an Address Geocoding System , 2007, GeoInformatica.

[13]  Scott Bell,et al.  Geocoding for public health research: Empirical comparison of two geocoding services applied to Canadian cities , 2014 .

[14]  Adika Mammadrahimli Assessment of crash location improvements in map-based geocoding systems and subsequent benefits to geospatial crash analysis , 2015 .

[15]  Stephan Winter,et al.  Is a Richer Address Data Model Relevant for LBS? , 2014, Principle and Application Progress in Location-Based Services.

[16]  Du Qingyun,et al.  A New Method of Chinese Address Extraction Based on Address Tree Model , 2015 .

[17]  Hassan A. Karimi,et al.  Geocoding Recommender: An Algorithm to Recommend Optimal Online Geocoding Services for Applications , 2011, Trans. GIS.

[18]  Judith Bishop,et al.  Address databases for national SDI: Comparing the novel data grid approach to data harvesting and federated databases , 2009, Int. J. Geogr. Inf. Sci..

[19]  Daniel W Goldberg,et al.  An evaluation framework for comparing geocoding systems , 2013, International Journal of Health Geographics.

[20]  Jerry H. Ratcliffe,et al.  On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units , 2001, Int. J. Geogr. Inf. Sci..

[21]  R. Fairlie Global Sourcebook of Address data Management: A Guide to Address Formats and Data in 194 Countries , 1999 .

[22]  Xiao Qin,et al.  Intelligent geocoding system to locate traffic crashes. , 2013, Accident; analysis and prevention.

[23]  Michael F. Goodchild,et al.  GIS and Transportation: Status and Challenges , 2000, GeoInformatica.

[24]  Jing Yi,et al.  A Brief Analysis of Geocoding , 2012 .

[25]  Richard L. Smith,et al.  Accuracy of commercial geocoding: assessment and implications , 2006, Epidemiologic perspectives & innovations : EP+I.

[26]  F. Ren,et al.  Spatial Analysis of the Home Addresses of Hospital Patients with Hepatitis B Infection or Hepatoma in Shenzhen, China from 2010 to 2012 , 2014, International journal of environmental research and public health.

[27]  Qingyun Du,et al.  Analysis of the Spatial Variation of Hospitalization Admissions for Hypertension Disease in Shenzhen, China , 2014, International journal of environmental research and public health.

[28]  Daniel W. Goldberg Advances in Geocoding Research and Practice , 2011, Trans. GIS.

[29]  Gerard Rushton,et al.  Geocoding in cancer research: a review. , 2006, American journal of preventive medicine.

[30]  Craig A. Knoblock,et al.  Exploiting online sources to accurately geocode addresses , 2004, GIS '04.

[31]  Daniel W. Goldberg Improving Geocoding Match Rates with Spatially‐Varying Block Metrics , 2011, Trans. GIS.

[32]  Qi Li,et al.  An address geocoding solution for Chinese cities , 2006, Geoinformatics.

[33]  Hassan A. Karimi,et al.  Comparative evaluation and analysis of online geocoding services , 2010, Int. J. Geogr. Inf. Sci..

[34]  Zhang Ao,et al.  An Efficient Bayesian Framework Based Place Name Segmentation Algorithm for Geocoding System , 2014, 2014 Fifth International Conference on Intelligent Systems Design and Engineering Applications.