Balancing Speed and Accuracy in Neural-Enhanced Phonetic Name Matching

Automatic co-text free name matching has a variety of important real-world applications, ranging from fiscal compliance to border control. Name matching systems use a variety of engines to compare two names for similarity, with one of the most critical being phonetic name similarity. In this work, we re-frame existing work on neural sequence-tosequence transliteration such that it can be applied to name matching. Subsequently, for performance reasons, we then build upon this work to utilize an alternative, non-recurrent neural encoder module. This ultimately yields a model which is 63% faster while still maintaining a 16% improvement in averaged precision over our baseline model.

[1]  Cherif Salama,et al.  A hybrid cross-language name matching technique using novel modified Levenshtein Distance , 2015, 2015 Tenth International Conference on Computer Engineering & Systems (ICCES).

[2]  Benjamin Newman,et al.  English-Chinese Name Machine Transliteration Using Search and Neural Network Models , 2018 .

[3]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  W. Tan,et al.  Deep entity matching with pre-trained language models , 2020, Proc. VLDB Endow..

[6]  Shafiq R. Joty,et al.  DeepER - Deep Entity Resolution , 2017, ArXiv.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Yuval Merhav,et al.  Design Challenges in Named Entity Transliteration , 2018, COLING.

[9]  Jörg Tiedemann,et al.  Pair Hidden Markov Model for Named Entity Matching , 2008, SCSS.

[10]  Shafiq R. Joty,et al.  Distributed Representations of Tuples for Entity Resolution , 2018, Proc. VLDB Endow..

[11]  K. Sarveswaran,et al.  Statistical Machine Learning for Transliteration: Transliterating names between Sinhala, Tamil and English , 2019, 2019 International Conference on Asian Language Processing (IALP).

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Luke Zettlemoyer,et al.  Zero-shot Entity Linking with Dense Entity Retrieval , 2019, ArXiv.

[14]  Steven Skiena,et al.  False-Friend Detection and Entity Matching via Unsupervised Transliteration , 2016, ArXiv.

[15]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[16]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[17]  Thomas Hofmann,et al.  End-to-End Neural Entity Linking , 2018, CoNLL.

[18]  Zita Marinho,et al.  Joint Learning of Named Entity Recognition and Entity Linking , 2019, ACL.

[19]  Maher Al-Sanabani,et al.  Designing an Accurate and Efficient Algorithm for Matching Arabic Names , 2019, 2019 First International Conference of Intelligent Computing and Engineering (ICOICE).

[20]  Kamal Sarkar,et al.  Bengali-to-English Forward and Backward Machine Transliteration Using Support Vector Machines , 2017, CICBA.

[21]  Dan Roth,et al.  Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages , 2018, EMNLP.

[22]  M. L. Dhore,et al.  Hindi to English Machine Transliteration of Named Entities using Conditional Random Fields , 2012 .

[23]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[24]  Maher Al-Sanabani,et al.  An Improved N-gram Distance for Names Matching , 2019, 2019 First International Conference of Intelligent Computing and Engineering (ICOICE).

[25]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[26]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[27]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Weidong Qu English-Chinese Name Transliteration by Latent Analogy , 2013, 2013 International Conference on Computational and Information Sciences.

[31]  Inho Kang,et al.  Verification of Transliteration Pairs Using Distance LSTM-CNN with Layer Normalization , 2017 .

[32]  2019 First International Conference of Intelligent Computing and Engineering (ICOICE) , 2019 .

[33]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[34]  Mark Steedman,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2013 .

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  Thomas Breuel,et al.  Sequence-to-sequence neural network models for transliteration , 2016, ArXiv.

[37]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[38]  Tiejun Zhao,et al.  A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration , 2013, ACL.

[39]  Xianpei Han,et al.  Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution , 2019, CIKM.

[40]  Lior Rokach,et al.  Matching entities across online social networks , 2014, Neurocomputing.

[41]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[42]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[43]  Siti Nurmaini,et al.  Author Matching Using String Similarities and Deep Neural Networks , 2020 .