Privacy preserving schema and data matching

In many business scenarios, record matching is performed across different data sources with the aim of identifying common information shared among these sources. However such need is often in contrast with privacy requirements concerning the data stored by the sources. In this paper, we propose a protocol for record matching that preserves privacy both at the data level and at the schema level. Specifically, if two sources need to identify their common data, by running the protocol they can compute the matching of their datasets without sharing their data in clear and only sharing the result of the matching. The protocol uses a third party, and maps records into a vector space in order to preserve their privacy. Experimental results show the efficiency of the matching protocol in terms of precision and recall as well as the good computational performance.

[1]  Chen Li,et al.  Efficient record linkage in large data sets , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[2]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[3]  Douglas R. Stinson,et al.  Cryptography: Theory and Practice , 1995 .

[4]  Dongwon Lee,et al.  Blocking-aware private record linkage , 2005, IQIS '05.

[5]  Peter Christen,et al.  Some methods for blindfolded record linkage , 2004, BMC Medical Informatics Decis. Mak..

[6]  Wei Zhao,et al.  Distributed Privacy Preserving Information Sharing , 2005, VLDB.

[7]  Tok Wang Ling,et al.  A knowledge-based approach for duplicate elimination in data cleaning , 2001, Inf. Syst..

[8]  Evimaria Terzi,et al.  On Honesty in Sovereign Information Sharing , 2006, EDBT.

[9]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[10]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[11]  Moni Naor,et al.  Adaptively secure multi-party computation , 1996, STOC '96.

[12]  Monica Scannapieco,et al.  Towards an Open Source Toolkit for Building Record Linkage Workflows , 2006 .

[13]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[14]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[16]  Catherine Quantin,et al.  How to ensure data security of an epidemiological follow-up: quality assessment of an anonymous record linkage procedure , 1998, Int. J. Medical Informatics.

[17]  Divesh Srivastava,et al.  Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[18]  David Bishop Introduction To Cryptography With Java Applets , 2002 .

[19]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[20]  David Maier,et al.  On the foundations of the universal relation model , 1984, TODS.

[21]  Divyakant Agrawal,et al.  Privacy Preserving Query Processing Using Third Parties , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Murat Kantarcioglu,et al.  Sovereign Joins , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[24]  H. Gabriela,et al.  Cluster-preserving Embedding of Proteins , 1999 .

[25]  Ahmed K. Elmagarmid,et al.  TAILOR: a record linkage toolbox , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Hanan Samet,et al.  Properties of Embedding Methods for Similarity Searching in Metric Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[28]  P. Ravikumar and W. W. Cohen and S. E. Fienberg,et al.  A Secure Protocol for Computing String Distance Metrics , 2004 .

[29]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[30]  D. Song,et al.  Private and threshold set-intersection , 2004 .

[31]  Myron Wish,et al.  Three-Way Multidimensional Scaling , 1978 .