DeepAM: Deep Semantic Address Representation for Address Matching

Address matching is a crucial task in various location-based businesses like take-out services and express delivery, which aims at identifying addresses referring to the same location in address databases. It is a challenging one due to various possible ways to express the address of a location, especially in Chinese. Traditional address matching approaches relying on string similarities and learning matching rules to identify addresses referring to the same location, could hardly solve the cases with redundant, incomplete or unusual expression of addresses. In this paper, we propose to map every address into a fixed-size vector in the same vector space using state-of-the-art deep sentence representation techniques and then measure the semantic similarity between addresses in this vector space. The attention mechanism is also applied to the model to highlight important features of addresses in their semantic representations. Last but not least, we novelly propose to get rich contexts for addresses from the web through web search engines, which could strongly enrich the semantic meaning of addresses that could be learned. Our empirical study conducted on two real-world address datasets demonstrates that our approach greatly improves both precision (up to 5%) and recall (up to 8%) of the state-of-the-art existing methods.

[1]  William J. Drummond,et al.  Address Matching: GIS Technology for Mapping Human Activity Patterns , 1995 .

[2]  Qingyun Du,et al.  Using an Optimized Chinese Address Matching Method to Develop a Geocoding Service: A Case Study of Shenzhen, China , 2016, ISPRS Int. J. Geo Inf..

[3]  Yu Bin,et al.  A Rule-Based Segmenting and Matching Method for Fuzzy Chinese Addresses , 2011 .

[4]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[5]  Du Qingyun,et al.  A New Method of Chinese Address Extraction Based on Address Tree Model , 2015 .

[6]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[7]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8]  Zhong Su,et al.  Address standardization with latent semantic association , 2009, KDD.

[9]  Li Jing Improvement on reverse directional maximum matching method based on hash structure for Chinese word segmentation , 2008 .

[10]  Shikhar Sharma,et al.  Automated Parsing of Geographical Addresses: A Multilayer Feedforward Neural Network Based Approach , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[11]  Shengrui Wang,et al.  Approximate Address Matching , 2010, 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[12]  L. Venkata Subramaniam,et al.  Transfer of Supervision for Improved Address Standardization , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Shen Li,et al.  Revisiting Correlations between Intrinsic and Extrinsic Evaluations of Word Embeddings , 2018, CCL.

[14]  Wang Yong,et al.  The Standardization Method of Address Information for POIs from Internet Based on Positional Relation , 2016 .

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.