A Coarse-to-Fine Model for Geolocating Chinese Addresses

Address geolocation aims to associate address texts to the geographic locations. In China, due to the increasing demand for LBS applications such as take-out services and express delivery, automatically geolocating the unstructured address information is the key issue that needs to be solved first. Recently, a few approaches have been proposed to automate the address geolocation by directly predicting geographic coordinates. However, such point-based methods ignore the hierarchy information in addresses which may cause poor geolocation performance. In this paper, we propose a hierarchical region-based approach for geolocating Chinese addresses. We model the address geolocation as a Sequence-to-Sequence (Seq2Seq) learning task, that is, the input sequence is a textual address, and the output sequence is a GeoSOT grid code which exactly represents multi-level regions covered by the address. A novel coarse-to-fine model, which combines BERT and LSTM, is designed to learn the task. The experimental results demonstrate that our model correctly understands the Chinese addresses and achieves the highest geolocation accuracy among all the baselines.

[1]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[2]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[3]  Dong Xu,et al.  Where your photo is taken: Geolocation prediction for social images , 2014, J. Assoc. Inf. Sci. Technol..

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..