Self-Identifying Data for Fair Use

Public-use earth science datasets are a useful resource with the unfortunate feature that their provenance is easily disconnected from their content. “Fair-use policies” typically associated with these datasets require appropriate attribution of providers by users, but sound and complete attribution is difficult if provenance information is lost. To address this, we introduce a technique to directly associate provenance information with sensor datasets. Our technique is similar to traditional watermarking but is intended for application to unstructured time-series datasets. Our approach is potentially imperceptible given sufficient margins of error in datasets and is robust to a number of benign but likely transformations including truncation, rounding, bit-flipping, sampling, and reordering. We provide algorithms for both one-bit and blind mark checking and show how our system can be adapted to various data representation types. Our algorithms are probabilistic in nature and are characterized by both combinatorial and empirical analyses. Mark embedding can be applied at any point in the data life cycle, allowing adaptation of our scheme to social or scientific concerns.

[1]  John S. Heidemann,et al.  Provenance in Sensornet Republishing , 2008, IPAW.

[2]  Ibrahim Kamel,et al.  A Lightweight Data Integrity Scheme for Sensor Networks , 2011, Sensors.

[3]  Harold I. Jacobson THE MAXIMUM VARIANCE OF RESTRICTED UNIMODAL DISTRIBUTIONS , 1969 .

[4]  Stéphane Bressan,et al.  Source Attribution for Querying Against Semi-structured Documents , 1998, Workshop on Web Information and Data Management.

[5]  Deborah Estrin,et al.  SensorBase.org: A Centralized Repository to Slog Sensor Network Data (KNO 2) , 2006 .

[6]  Jaipal Singh,et al.  Watermarking technique for copyright protection of wireless sensor network data using LFSR and Kolmogorov complexity , 2012, MoMM '12.

[7]  Oliver Günther,et al.  EML - the Environmental Markup Language , 2000 .

[8]  Ulf Lindqvist,et al.  VEIL: A System for Certifying Video Provenance , 2007, Ninth IEEE International Symposium on Multimedia (ISM 2007).

[9]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[10]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[11]  Wang Chiew Tan Containment of Relational Queries with Annotation Propagation , 2003, DBPL.

[12]  Fereidoon Sadri CIKM'98 First Workshop on Web Information and Data Management (WIDM'98), Bathesda, Maryland, USA, November 6, 1998 , 1998 .

[13]  Radu Sion,et al.  Rights protection for discrete numeric streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Lynn Yarmey,et al.  Data Stewardship: Environmental Data Curation and a Web-of-Repositories , 2009, Int. J. Digit. Curation.

[15]  Doina Bucur,et al.  Software verification for TinyOS , 2010, IPSN '10.

[16]  Elisa Bertino,et al.  Watermarking Relational Databases Using Optimization-Based Techniques , 2008, IEEE Transactions on Knowledge and Data Engineering.

[17]  David Gross-Amblard,et al.  Multimedia and metadata watermarking driven by application constraints , 2006, 2006 12th International Multi-Media Modelling Conference.

[18]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[19]  M. Atallah,et al.  Watermarking Relational Databases , 2002 .

[20]  Ingemar J. Cox,et al.  The First 50 Years of Electronic Watermarking , 2002, EURASIP J. Adv. Signal Process..

[21]  BertinoElisa,et al.  Watermarking Relational Databases Using Optimization-Based Techniques , 2008 .

[22]  Jessica J. Fridrich,et al.  Comparing robustness of watermarking techniques , 1999, Electronic Imaging.

[23]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[24]  Chaki Ng,et al.  Provenance-Aware Sensor Data Storage , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[25]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[26]  Stephen Chong,et al.  Self-identifying sensor data , 2010, IPSN '10.