Randomization Methods to Ensure Data Privacy

Definition Many organizations, e.g., government statistical offices and search engine companies, collect potentially sensitive information regarding individuals either to publish this data for research, or in return for useful services. While some data collection organizations, like the census, are legally required not to breach the privacy of the individuals, other data collection organizations may not be trusted to uphold privacy. Hence, if U denotes the original data containing sensitive information about a set of individuals, then an untrusted data collector or researcher should only have access to an anonymized version of the data, U*, that does not disclose the sensitive information about the individuals. A randomized anonymization algorithm R is said to be a privacy preserving randomization method if for every table T, and for every output T * = R(T), the privacy of all the sensitive information of each individual in the original data is provably guaranteed.

[1]  Diego Calvanese,et al.  View-Based Query Processing: On the Relationship Between Rewriting, Answering and Losslessness , 2005, ICDT.

[2]  Rachid Guerraoui,et al.  The Implementation of a CORBA Object Group Service , 1998, Theory Pract. Object Syst..

[3]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[4]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[5]  John A. Major,et al.  Selecting among rules induced from a hurricane database , 1993, Journal of Intelligent Information Systems.

[6]  Louise E. Moser,et al.  Unification of transactions and replication in three-tier architectures based on CORBA , 2005, IEEE Transactions on Dependable and Secure Computing.

[7]  Charles L. A. Clarke,et al.  The MultiText retrieval system (demonstration abstract) , 1999, SIGIR '99.

[8]  Esther Pacitti,et al.  Update propagation strategies to improve freshness in lazy master replicated databases , 2000, The VLDB Journal.

[9]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[10]  Flip Korn,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD 2000.

[11]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[12]  Rachid Guerraoui,et al.  The Database State Machine Approach , 2003, Distributed and Parallel Databases.

[13]  André Schiper,et al.  Comparison of database replication techniques based on total order broadcast , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Chin-Wan Chung,et al.  An Efficient and Scalable Approach to CNN Queries in a Road Network , 2005, VLDB.

[15]  Bettina Kemme,et al.  Fault-tolerance for stateful application servers in the presence of advanced transactions patterns , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[16]  Christian S. Jensen,et al.  Nearest neighbor and reverse nearest neighbor queries for moving objects , 2002, Proceedings International Database Engineering and Applications Symposium.

[17]  Rada Chirkova,et al.  Materializing views with minimal size to answer queries , 2003, PODS '03.

[18]  Pekka Kilpeläinen,et al.  Using sgrep for querying structured text files 1 , 1996 .

[19]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[20]  Victor C. S. Lee,et al.  Distance indexing on road networks , 2006, VLDB.

[21]  Philip S. Yu,et al.  The state of the art in locally distributed Web-server systems , 2002, CSUR.

[22]  Masatoshi Yoshikawa,et al.  Spatial indexing of high-dimensional data based on relative approximation , 2002, The VLDB Journal.

[23]  Airi Salminen PAT expressions: an algebra for text search , 2007 .

[24]  David Garlan,et al.  Lightweight structure in text , 2002 .

[25]  Shashi Shekhar,et al.  Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[27]  Rafael Alonso,et al.  Data caching issues in an information retrieval system , 1990, TODS.

[28]  Calton Pu,et al.  Replica control in distributed systems: as asynchronous approach , 1991, SIGMOD '91.

[29]  Jan Vahrenhold,et al.  Reverse Nearest Neighbor Queries , 2002, Encyclopedia of GIS.

[30]  Luis Irún-Briz,et al.  Managing Transaction Conflicts in Middleware-based Database Replication Architectures , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[31]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[32]  Ricardo Jiménez-Peris,et al.  Consistent and Scalable Cache Replication for Multi-tier J2EE Applications , 2007, Middleware.

[33]  Torben Bach Pedersen,et al.  Nearest neighbor queries in road networks , 2003, GIS '03.

[34]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[35]  Priya Narasimhan,et al.  The Eternal system: an architecture for enterprise applications , 1999, Proceedings Third International Enterprise Distributed Object Computing. Conference (Cat. No.99EX366).

[36]  Ricardo Jiménez-Peris,et al.  Deterministic scheduling for transactional multithreaded replicas , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[37]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[38]  Cyrus Shahabi,et al.  Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases , 2004, VLDB.

[39]  Jani Jaakkola Nested text-region algebra , 1999 .

[40]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[41]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[42]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[43]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[44]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[45]  Kenneth Salem,et al.  Lazy database replication with snapshot isolation , 2006, VLDB.

[46]  Patrick Valduriez,et al.  Preventive Replication in a Database Cluster , 2005, Distributed and Parallel Databases.

[47]  Jennifer Widom,et al.  Adaptive precision setting for cached approximate values , 2001, SIGMOD '01.

[48]  Yufei Tao,et al.  Reverse Nearest Neighbor Search in Metric Spaces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[49]  Santosh K. Shrivastava,et al.  Enhancing an Application Server to Support Available Components , 2008, IEEE Transactions on Software Engineering.

[50]  Philip A. Bernstein,et al.  Relaxed-currency serializability for middle-tier caching and replication , 2006, SIGMOD Conference.

[51]  King-Ip Lin,et al.  Applying bulk insertion techniques for dynamic reverse nearest neighbor problems , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[52]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[53]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[54]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[55]  Divesh Srivastava,et al.  Reverse Nearest Neighbor Aggregates Over Data Streams , 2002, VLDB.

[56]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[57]  Roger Zimmermann,et al.  ANNATTO: Adaptive Nearest Neighbor Queries in Travel Time Networks , 2006, 7th International Conference on Mobile Data Management (MDM'06).

[58]  Christian S. Jensen,et al.  The Islands Approach to Nearest Neighbor Querying in Spatial Networks , 2005, SSTD.

[59]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[60]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[61]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[62]  Dan Suciu,et al.  A framework for using reference ontologies as a foundation for the semantic web , 2006, AMIA.

[63]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[64]  Cyrus Shahabi,et al.  A Road Network Embedding Technique for K-Nearest Neighbor Search in Moving Object Databases , 2002, GIS '02.

[65]  Roberto Baldoni,et al.  Three‐tier replication for FT‐CORBA infrastructures , 2003, Softw. Pract. Exp..

[66]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[67]  Ricardo Jiménez-Peris,et al.  Highly Available Long Running Transactions and Activities for J2EE Applications , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[68]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[69]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[70]  Fernando Pedone,et al.  Sprint: a middleware for high-performance transaction processing , 2007, EuroSys '07.

[71]  Heiko Schuldt,et al.  FAS - A Freshness-Sensitive Coordination Middleware for a Cluster of OLAP Components , 2002, VLDB.

[72]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[73]  Prasenjit Mitra,et al.  An algorithm for answering queries efficiently using views , 2001, Proceedings 12th Australasian Database Conference. ADC 2001.

[74]  Per-Åke Larson,et al.  Computing Queries from Derived Relations , 1985, VLDB.

[75]  James B. D. Joshi,et al.  An RBAC framework for time constrained secure interoperation in multi-domain environments , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[76]  Yufei Tao,et al.  Query Processing in Spatial Network Databases , 2003, VLDB.

[77]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[78]  Matthew Young-Lai,et al.  One-pass evaluation of region algebra expressions , 2003, Inf. Syst..

[79]  Amit P. Sheth,et al.  Management of interdependent data: specifying dependency and consistency requirements , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[80]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[81]  Tova Milo,et al.  Algebras for Querying Text Regions: Expressive Power and Optimization , 1998, J. Comput. Syst. Sci..

[82]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[83]  Xiaolei Qian,et al.  Query folding , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[84]  Yair Amir,et al.  From total order to database replication , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[85]  Yi Lin,et al.  Enhancing Edge Computing with Database Replication , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[86]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[87]  Miron Livny,et al.  Conflict detection tradeoffs for replicated data , 1991, TODS.

[88]  Dimitris Papadias,et al.  Aggregate nearest neighbor queries in road networks , 2005, IEEE Transactions on Knowledge and Data Engineering.

[89]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[90]  Alan L. Cox,et al.  Distributed Versioning: Consistent Replication for Scaling Back-End Databases of Dynamic Content Web Sites , 2003, Middleware.

[91]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[92]  Calton Pu,et al.  A Formal Characterization of Epsilon Serializability , 1995, IEEE Trans. Knowl. Data Eng..

[93]  Rachid Guerraoui,et al.  e-Transactions: End-to-End Reliability for Three-Tier Architectures , 2002, IEEE Trans. Software Eng..

[94]  Priya Narasimhan,et al.  Reconciling Replication and Transactions for the End-to-End Reliability of CORBA Applications , 2002, CoopIS/DOA/ODBASE.

[95]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[96]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.

[97]  Anita Burgun Desiderata for domain reference ontologies in biomedicine. , 2006, Journal of biomedical informatics.

[98]  Matthew Young-Lai,et al.  Text structure recognition using a region algebra , 2001 .

[99]  Ryen W. White,et al.  A study of factors affecting the utility of implicit relevance feedback , 2005, SIGIR '05.

[100]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[101]  Ralf Hartmut Güting,et al.  Modeling and querying moving objects in networks , 2006, The VLDB Journal.

[102]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[103]  Divyakant Agrawal,et al.  Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[104]  Willy Zwaenepoel,et al.  C-JDBC: Flexible Database Clustering Middleware , 2004, USENIX Annual Technical Conference, FREENIX Track.

[105]  Arie Segev,et al.  A consensus glossary of temporal database concepts , 1994, SIGMOD 1994.

[106]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[107]  Divyakant Agrawal,et al.  Constrained Nearest Neighbor Queries , 2001, Encyclopedia of GIS.

[108]  Jianliang Xu,et al.  Fast Nearest Neighbor Search on Road Networks , 2006, EDBT.

[109]  Dieter Pfoser,et al.  Indexing of network constrained moving objects , 2003, GIS '03.

[110]  Ravishankar K. Iyer,et al.  Active replication of multithreaded applications , 2006, IEEE Transactions on Parallel and Distributed Systems.

[111]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[112]  Amr El Abbadi,et al.  Availability in partitioned replicated databases , 1985, PODS.

[113]  Patrick Valduriez,et al.  The leganet system: Freshness-aware transaction routing in a database cluster , 2007, Inf. Syst..

[114]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[115]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[116]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[117]  Forbes J. Burkowski Retrieval activities in a database consisting of heterogeneous collections of structured text , 1992, SIGIR '92.

[118]  Elke A. Rundensteiner,et al.  Bulk-insertions into R-trees using the small-tree-large-tree approach , 1998, GIS '98.

[119]  Yufei Tao,et al.  Reverse nearest neighbors in large graphs , 2006, IEEE Transactions on Knowledge and Data Engineering.

[120]  Gio Wiederhold,et al.  Consistency control of replicated data in federated databases , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[121]  Gail-Joon Ahn,et al.  A role-based delegation framework for healthcare information systems , 2002, SACMAT '02.

[122]  Elke A. Rundensteiner,et al.  GBI: A Generalized R-Tree Bulk-Insertion Strategy , 1999, SSD.

[123]  Amanda Spink,et al.  Use of query reformulation and relevance feedback by Excite users , 2000, Internet Res..

[124]  Sholom M. Weiss,et al.  Predictive data mining - a practical guide , 1997 .

[125]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[126]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[127]  Ramez Elmasri,et al.  Representing retroactive and proactive versions in bi-temporal databases (2TDB) , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[128]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[129]  Rajeev Rastogi,et al.  Update propagation protocols for replicated databates , 1999, SIGMOD '99.

[130]  Timos K. Sellis,et al.  Data Warehouse Configuration , 1997, VLDB.

[131]  Ricardo Jiménez-Peris,et al.  Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation , 2007, 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007).

[132]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[133]  Michal Szymaniak,et al.  Replication for web hosting systems , 2004, CSUR.

[134]  Fernando Pedone,et al.  Database replication using generalized snapshot isolation , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[135]  Hanan Samet,et al.  Efficient query processing on spatial networks , 2005, GIS '05.

[136]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[137]  Gustavo Alonso,et al.  Exploiting atomic broadcast in replicated databases , 1997 .

[138]  Yannis Manolopoulos,et al.  R-Trees: Theory and Applications , 2005, Advanced Information and Knowledge Processing.

[139]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[140]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[141]  Prashant J. Shenoy,et al.  Resilient and coherence preserving dissemination of dynamic data using cooperating peers , 2004, IEEE Transactions on Knowledge and Data Engineering.

[142]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..