Secure XML Publishing without Information Leakage in the Presence of Data Inference

Recent applications are seeing an increasing need that publishing XML documents should meet precise security requirements. In this paper, we consider data-publishing applications where the publisher specifies what information is sensitive and should be protected. We show that if a partial document is published carelessly, users can use common knowledge (e.g., "all patients in the same ward have the same disease") to infer more data, which can cause leakage of sensitive information. The goal is to protect such information in the presence of data inference with common knowledge. We consider common knowledge represented as semantic XML constraints. We formulate the process how users can infer data using three types of common XML constraints. Interestingly, no matter what sequences users follow to infer data, there is a unique, maximal document that contains all possible inferred documents. We develop algorithms for finding a partial document of a given XML document without causing information leakage, while allowing publishing as much data as possible. Our experiments on real data sets show that effect of inference on data security, and how the proposed techniques can prevent such leakage from happening.

[1]  Gene Tsudik,et al.  A Privacy-Preserving Index for Range Queries , 2004, VLDB.

[2]  Sushil Jajodia,et al.  Inference Problems in Multilevel Secure Database Management Systems , 2006 .

[3]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[4]  Sabrina De Capitani di Vimercati,et al.  A fine-grained access control system for XML documents , 2002, TSEC.

[5]  Laks V. S. Lakshmanan,et al.  Compressed Accessibility Map: Efficient Access Control for XML , 2002, VLDB.

[6]  Wenfei Fan,et al.  Secure XML querying with security views , 2004, SIGMOD '04.

[7]  Ernesto Damiani,et al.  Design and implementation of an access control processor for XML documents , 2000, Comput. Networks.

[8]  Sabrina De Capitani di Vimercati,et al.  Minimal data upgrading to prevent inference and association attacks , 1999, PODS '99.

[9]  Chen Li,et al.  RACCOON: a peer-based system for data integration and sharing , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Elisa Bertino,et al.  A Secure Publishing Service for Digital Libraries of XML Documents , 2001, ISC.

[11]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[12]  Gabriel M. Kuper,et al.  A unified constraint model for XML , 2002, Comput. Networks.

[13]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[14]  Dan Suciu,et al.  Schema mediation in peer data management systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Dan Suciu,et al.  Controlling Access to Published Data Using Cryptography , 2003, VLDB.

[16]  Alban Gabillon,et al.  Regulating Access to XML documents , 2001, DBSec.

[17]  Joachim Biskup,et al.  Controlled Query Evaluation for Known Policies by Combining Lying and Refusal , 2002, FoIKS.

[18]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Teresa F. Lunt,et al.  A Semantic Framework of the Multilevel Secure Relational Model , 1997, IEEE Trans. Knowl. Data Eng..

[20]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[21]  Sushil Jajodia,et al.  Secure Databases: Constraints, Inference Channels, and Monitoring Disclosures , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[23]  Sushil Jajodia,et al.  Flexible support for multiple access control policies , 2001, TODS.

[24]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.