论文信息 - A probabilistic model for approximate identity matching

A probabilistic model for approximate identity matching

Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

Hsinchun Chen | G. Alan Wang | Homa Atabakhsh

[1] Hsinchun Chen,et al. Discovering Identity Problems: A Case Study , 2005, ISI.

[2] L. Jean Camp. Open code for digital government , 2003 .

[3] H. Atabakhsh,et al. Cross-jurisdictional criminal activity networks to support border and transportation security , 2004, Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No.04TH8749).

[4] Sumit Sarkar,et al. A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases , 2002, IEEE Trans. Knowl. Data Eng..

[5] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[6] Gang Wang,et al. Automatically detecting deceptive criminal identities , 2004, CACM.

[7] Pradeep Ravikumar,et al. A Hierarchical Graphical Model for Record Linkage , 2004, UAI.

[8] Hsinchun Chen,et al. Cross-Jurisdictional Activity Networks to Support Criminal Investigations , 2004, DG.O.

[9] L. Jean Camp,et al. Identity in Digital Government , 2004 .