论文信息 - Fuzzy String Matching Using Sentence Embedding Algorithms

Fuzzy String Matching Using Sentence Embedding Algorithms

Fuzzy string matching has many applications. Traditional approaches mainly use the appearance information of characters or words but do not use their semantic meanings. We postulate that the latter information may also be important for this task. To validate this hypothesis, we build a pipeline in which approximate string matching is used to pre-select some candidates and sentence embedding algorithms are used to select the final results from these candidates. The aim of sentence embedding is to represent semantic meaning of the words. Two sentence embedding algorithms are tested, convolutional neural network (CNN) and averaging word2vec. Experiments show that the proposed pipeline can significantly improve the accuracy and averaging word2vec works slightly better than CNN.

Xiaolin Hu | Yu Rong

[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Ye Zhang,et al. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[3] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.

[8] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.