A Formal Usability Constraints Model for Watermarking of Outsourced Datasets

The large datasets are being mined to extract hidden knowledge and patterns that assist decision makers in making effective, efficient, and timely decisions in an ever increasing competitive world. This type of “knowledge-driven” data mining activity is not possible without sharing the “datasets” between their owners and data mining experts (or corporations); as a consequence, protecting ownership (by embedding a watermark) on the datasets is becoming relevant. The most important challenge in watermarking (to be mined) datasets is: how to preserve knowledge in features or attributes? Usually, an owner needs to manually define “Usability constraints” for each type of dataset to preserve the contained knowledge. The major contribution of this paper is a novel formal model that facilitates a data owner to define usability constraints-to preserve the knowledge contained in the dataset-in an automated fashion. The model aims at preserving “classification potential” of each feature and other major characteristics of datasets that play an important role during the mining process of data; as a result, learning statistics and decision-making rules also remain intact. We have implemented our model and integrated it with a new watermark embedding algorithm to prove that the inserted watermark not only preserves the knowledge contained in a dataset but also significantly enhances watermark security compared with existing techniques. We have tested our model on 25 different data-mining datasets to show its efficacy, effectiveness, and the ability to adapt and generalize.

[1]  Muddassar Farooq,et al.  An Information-Preserving Watermarking Scheme for Right Protection of EMR Systems , 2012, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jens Palsberg,et al.  Experience with software watermarking , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[3]  Ignacio E. Grossmann,et al.  Mixed-Integer Nonlinear Programming: A Survey of Algorithms and Applications , 1997 .

[4]  Radu Sion,et al.  Rights protection for relational data , 2003, IEEE Transactions on Knowledge and Data Engineering.

[5]  Robert Michael Lewis,et al.  Pattern Search Methods for Linearly Constrained Minimization , 1999, SIAM J. Optim..

[6]  Mikhail J. Atallah,et al.  Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.

[7]  Peter J. Haas,et al.  Watermarking relational data: framework, algorithms and analysis , 2003, The VLDB Journal.

[8]  Donald E. Grierson,et al.  Comparison among five evolutionary-based optimization algorithms , 2005, Adv. Eng. Informatics.

[9]  Christodoulos A. Floudas,et al.  Mixed Integer Nonlinear Programming , 2009, Encyclopedia of Optimization.

[10]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[11]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[12]  Elisa Bertino,et al.  Watermarking Relational Databases Using Optimization-Based Techniques , 2008, IEEE Transactions on Knowledge and Data Engineering.

[13]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[14]  Linus E. Schrage LINDO : an optimization modeling system , 1991 .

[15]  M. Atallah,et al.  Watermarking Relational Databases , 2002 .

[16]  Ignacio E. Grossmann,et al.  An outer-approximation algorithm for a class of mixed-integer nonlinear programs , 1987, Math. Program..