Identity Matching Based on Probabilistic Relational Models

Identity management is critical to various organizational practices ranging from citizen services to crime investigation. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. In this study we propose a probabilistic relational model (PRM) based approach to match identities in databases. By exploring a database relational structure, we derive three categories of features, namely personal identity features, social activity features, and social relationship features. Based on these derived features, a probabilistic prediction model can be constructed to make a matching decision on a pair of identities. An experimental study using a real criminal dataset demonstrates the effectiveness of the proposed PRM-based approach. By incorporating social activity features, the average precision of identity matching increased from 53.73 % to 54.64%; furthermore, the incorporation of social relation features increased the average precision to 68.27%.

[1]  H. Atabakhsh,et al.  Cross-jurisdictional criminal activity networks to support border and transportation security , 2004, Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No.04TH8749).

[2]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[3]  Satoshi Hoshino,et al.  Impact of artificial "gummy" fingers on fingerprint systems , 2002, IS&T/SPIE Electronic Imaging.

[4]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[5]  L. Jean Camp,et al.  Identity in Digital Government , 2004 .

[6]  Hsinchun Chen,et al.  Discovering Identity Problems: A Case Study , 2005, ISI.

[7]  Donald E. Brown,et al.  Data association methods with applications to law enforcement , 2003, Decis. Support Syst..

[8]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[9]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[10]  C RedmanThomas The impact of poor data quality on the typical enterprise , 1998 .

[11]  J. Snyder Online Auction Fraud: Are the Auction Houses Doing All They Should or Could to Stop Online Fraud? , 2000 .

[12]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[13]  P. Kollock,et al.  Communities in Cyberspace , 2002 .

[14]  K. C. White,et al.  IDs—Not that Easy: Questions About Nationwide Identity Systems , 2002 .

[15]  H. Chen,et al.  Automatically detecting criminal identity deception: an adaptive detection algorithm , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[16]  Hsinchun Chen,et al.  A Unified Recommendation Framework Based on Probabilistic Relational Models , 2005 .

[17]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[18]  Lynette I. Millett,et al.  IDs -- Not That Easy: Questions About Nationwide Identity Systems , 2002 .

[19]  Gang Wang,et al.  Automatically detecting deceptive criminal identities , 2004, CACM.