Knowledge-aware identity services

The identification problem is concerned with the question whether two objects in an application refer to the same real-world entity. In this paper, the identification problem is investigated from a knowledge modelling point of view. We develop a framework of establishing knowledge-aware identity services by abstracting identity knowledge into an additional identity layer. The knowledge model in the identity service layer provides a capability for combining declarative formulae with concrete data and thus allows us to capture domain-specific identity knowledge at flexible levels of abstraction. By adding validation constraints to the identity service, we are also able to reason about inconsistency of identity knowledge. In doing so, the accuracy of identity knowledge can be improved over time, especially when utilising identity services provided by different communities in a service-oriented architecture. Our experimental study shows the effectiveness of the proposed knowledge modelling approach and the effects of domain-specific identity knowledge on data quality control.

[1]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[2]  Barbara B. Tillett,et al.  A Virtual International Authority File. , 2001 .

[3]  Surajit Chaudhuri,et al.  Transformation-based Framework for Record Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[5]  Qing Wang,et al.  Intelligent Author Identification , 2010, ER Workshops.

[6]  Ercan Öztemel,et al.  Enterprise knowledge management model: a knowledge tower , 2011, Knowledge and Information Systems.

[7]  R. Simonsen Credit where credit is due , 1995, Nature Cell Biology.

[8]  Richard Hull,et al.  The format model: a theory of database organization , 1982, JACM.

[9]  Craig A. Knoblock,et al.  Learning object identification rules for information integration , 2001, Inf. Syst..

[10]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[11]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12]  Raghav Kaushik,et al.  A grammar-based entity representation framework for data cleaning , 2009, SIGMOD Conference.

[13]  Pedro M. Domingos,et al.  Object Identification with Attribute-Mediated Dependences , 2005, PKDD.

[14]  Jane Qiu,et al.  Scientific publishing: Identity crisis , 2008, Nature.

[15]  Lise Getoor,et al.  Deduplication and Group Detection using Links , 2004 .

[16]  Franco Turini,et al.  Inductive database languages: requirements and examples , 2011, Knowledge and Information Systems.

[17]  A. D. Sukhanov The Problem of , 1963 .

[18]  Lise Getoor,et al.  Iterative record linkage for cleaning and integration , 2004, DMKD '04.

[19]  Françoise Bourdon,et al.  International cooperation in the field of authority data: An analytical study with recommendations , 1993 .

[20]  Rodrigo Gonçalves,et al.  Approximate data instance matching: a survey , 2011, Knowledge and Information Systems.

[21]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[22]  Soumya Simanta,et al.  Identity management and its impact on federation in a system-of-systems context , 2009, 2009 3rd Annual IEEE Systems Conference.

[23]  Alfred V. Aho,et al.  Universality of data retrieval languages , 1979, POPL.

[24]  Audun Jøsang,et al.  A User-centric Federated Single Sign-on System , 2007, 2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007).

[25]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[26]  Gang Liu,et al.  Short text similarity based on probabilistic topics , 2009, Knowledge and Information Systems.