论文信息 - Balancing Speed and Accuracy in Neural-Enhanced Phonetic Name Matching

Balancing Speed and Accuracy in Neural-Enhanced Phonetic Name Matching

Automatic co-text free name matching has a variety of important real-world applications, ranging from fiscal compliance to border control. Name matching systems use a variety of engines to compare two names for similarity, with one of the most critical being phonetic name similarity. In this work, we re-frame existing work on neural sequence-tosequence transliteration such that it can be applied to name matching. Subsequently, for performance reasons, we then build upon this work to utilize an alternative, non-recurrent neural encoder module. This ultimately yields a model which is 63% faster while still maintaining a 16% improvement in averaged precision over our baseline model.

[1] Cherif Salama,et al. A hybrid cross-language name matching technique using novel modified Levenshtein Distance , 2015, 2015 Tenth International Conference on Computer Engineering & Systems (ICCES).

[2] Benjamin Newman,et al. English-Chinese Name Machine Transliteration Using Search and Neural Network Models , 2018 .

[3] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[5] W. Tan,et al. Deep entity matching with pre-trained language models , 2020, Proc. VLDB Endow..

[6] Shafiq R. Joty,et al. DeepER - Deep Entity Resolution , 2017, ArXiv.

[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[8] Yuval Merhav,et al. Design Challenges in Named Entity Transliteration , 2018, COLING.

[9] Jörg Tiedemann,et al. Pair Hidden Markov Model for Named Entity Matching , 2008, SCSS.

[10] Shafiq R. Joty,et al. Distributed Representations of Tuples for Entity Resolution , 2018, Proc. VLDB Endow..

[11] K. Sarveswaran,et al. Statistical Machine Learning for Transliteration: Transliterating names between Sinhala, Tamil and English , 2019, 2019 International Conference on Asian Language Processing (IALP).

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Luke Zettlemoyer,et al. Zero-shot Entity Linking with Dense Entity Retrieval , 2019, ArXiv.

[14] Steven Skiena,et al. False-Friend Detection and Entity Matching via Unsupervised Transliteration , 2016, ArXiv.

[15] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[16] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[17] Thomas Hofmann,et al. End-to-End Neural Entity Linking , 2018, CoNLL.

[18] Zita Marinho,et al. Joint Learning of Named Entity Recognition and Entity Linking , 2019, ACL.

[19] Maher Al-Sanabani,et al. Designing an Accurate and Efficient Algorithm for Matching Arabic Names , 2019, 2019 First International Conference of Intelligent Computing and Engineering (ICOICE).

[20] Kamal Sarkar,et al. Bengali-to-English Forward and Backward Machine Transliteration Using Support Vector Machines , 2017, CICBA.

[21] Dan Roth,et al. Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages , 2018, EMNLP.

[22] M. L. Dhore,et al. Hindi to English Machine Transliteration of Named Entities using Conditional Random Fields , 2012 .

[23] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[24] Maher Al-Sanabani,et al. An Improved N-gram Distance for Names Matching , 2019, 2019 First International Conference of Intelligent Computing and Engineering (ICOICE).

[25] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[26] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[27] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[28] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30] Weidong Qu. English-Chinese Name Transliteration by Latent Analogy , 2013, 2013 International Conference on Computational and Information Sciences.

[31] Inho Kang,et al. Verification of Transliteration Pairs Using Distance LSTM-CNN with Layer Normalization , 2017 .

[32] 2019 First International Conference of Intelligent Computing and Engineering (ICOICE) , 2019 .

[33] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[34] Mark Steedman,et al. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2013 .

[35] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36] Thomas Breuel,et al. Sequence-to-sequence neural network models for transliteration , 2016, ArXiv.

[37] L. Baum,et al. Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[38] Tiejun Zhao,et al. A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration , 2013, ACL.

[39] Xianpei Han,et al. Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution , 2019, CIKM.

[40] Lior Rokach,et al. Matching entities across online social networks , 2014, Neurocomputing.

[41] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[42] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[43] Siti Nurmaini,et al. Author Matching Using String Similarities and Deep Neural Networks , 2020 .