Place Deduplication with Embeddings

Thanks to the advancing mobile location services, people nowadays can post about places to share visiting experience on-the-go. A large place graph not only helps users explore interesting destinations, but also provides opportunities for understanding and modeling the real world. To improve coverage and flexibility of the place graph, many platforms import places data from multiple sources, which unfortunately leads to the emergence of numerous duplicated places that severely hinder subsequent location-related services. In this work, we take the anonymous place graph from Facebook as an example to systematically study the problem of place deduplication: We carefully formulate the problem, study its connections to various related tasks that lead to several promising basic models, and arrive at a systematic two-step data-driven pipeline based on place embedding with multiple novel techniques that works significantly better than the state-of-the-art.

[1]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[2]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[3]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[4]  Ralph Grishman,et al.  Semi-supervised Semantic Pattern Discovery with Guidance from Unsupervised Pattern Clusters , 2010, COLING.

[5]  Marcos André Gonçalves,et al.  A Practical and Effective Sampling Selection Strategy for Large Scale Deduplication , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dong Liu,et al.  MIX: Multi-Channel Information Crossing for Text Matching , 2018, KDD.

[8]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[9]  Ahmed Eldawy,et al.  LARS*: An Efficient and Scalable Location-Aware Recommender System , 2014, IEEE Transactions on Knowledge and Data Engineering.

[10]  Marian Olteanu,et al.  Deduplicating a places database , 2014, WWW.

[11]  Surajit Chaudhuri,et al.  A framework for robust discovery of entity synonyms , 2012, KDD.

[12]  Shazia Wasim Sadiq,et al.  Joint Modeling of User Check-in Behaviors for Point-of-Interest Recommendation , 2015, CIKM.

[13]  Jiawei Han,et al.  Automatic Synonym Discovery with Knowledge Bases , 2017, KDD.

[14]  Qiang Sun,et al.  A Multi-level Attention Model for Text Matching , 2018, ICANN.

[15]  Shiliang Zhang,et al.  Deep Attributes Driven Multi-Camera Person Re-identification , 2016, ECCV.

[16]  Xing Xie,et al.  GeoMF: joint geographical modeling and matrix factorization for point-of-interest recommendation , 2014, KDD.

[17]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Fang Kong,et al.  Semi-Supervised Learning for Semantic Relation Classification using Stratified Sampling Strategy , 2009, EMNLP.

[19]  Vladimir Gorovoy,et al.  Comparison of Different Approaches for Hotels Deduplication , 2016, KESW.

[20]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[22]  Prithviraj Sen,et al.  Active Learning for Large-Scale Entity Resolution , 2017, CIKM.

[23]  Jiawei Han,et al.  Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation , 2017, KDD.

[24]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[25]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[26]  Christian Biemann,et al.  Watset: Automatic Induction of Synsets from a Graph of Synonyms , 2017, ACL.

[27]  Gemma Boleda,et al.  Inclusive yet Selective: Supervised Distributional Hypernymy Detection , 2014, COLING.

[28]  Yeye He,et al.  Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora , 2016, WWW.

[29]  Chia-Hui Chang,et al.  Verification of POI and Location Pairs via Weakly Labeled Web Data , 2015, WWW.

[30]  Jieping Ye,et al.  Did You Enjoy the Ride? Understanding Passenger Experience via Heterogeneous Network Embedding , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[31]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[32]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[35]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[36]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[37]  Xiaoli Li,et al.  Rank-GeoFM: A Ranking based Geographical Factorization Method for Point of Interest Recommendation , 2015, SIGIR.

[38]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[39]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[40]  Mao Ye,et al.  Exploiting geographical influence for collaborative point-of-interest recommendation , 2011, SIGIR.

[41]  Mao Ye,et al.  Location recommendation for out-of-town users in location-based social networks , 2013, CIKM.

[42]  Qing Wang,et al.  Efficient Entity Resolution with Adaptive and Interactive Training Data Selection , 2015, 2015 IEEE International Conference on Data Mining.

[43]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[45]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[46]  David Zhang,et al.  Joint Learning of Single-Image and Cross-Image Representations for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Xiang Ren,et al.  Synonym Discovery for Structured Entities on Heterogeneous Graphs , 2015, WWW.

[48]  Jian-Huang Lai,et al.  Deep Ranking for Person Re-Identification via Joint Representation Learning , 2015, IEEE Transactions on Image Processing.

[49]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[50]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[51]  Chunyan Miao,et al.  Exploiting Geographical Neighborhood Characteristics for Location Recommendation , 2014, CIKM.

[52]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[53]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).