Preserving Access Privacy Over Large Databases

Anonymity systems preserve the identities of users as they access Internet data sources. The security of many such systems, such as Tor, relies on a model where the adversary does not have a global view or control of the network. A different problem is that of preserving access privacy for users accessing a large database over the Internet in a model where the adversary has full control of the database. Private information retrieval (PIR) has been introduced to prevent such a powerful adversary from being able to learn any information about the user’s query, such as which particular one of the many data items under the adversary’s control has been retrieved. However, the state-of-the-art PIR schemes have a high computational overhead that makes them expensive for querying large databases. In this paper, we develop an access privacy technique and system for querying large databases. Our technique explores privacy-preserving constraint-based query transformations, offline data classification, and privacy-preserving queries to index structures much smaller than the databases. We draw inspiration from techniques in the traditional information retrieval domain. Our approach allows users to query a large database by statically specifying or dynamically defining database portions with inherently high diversity, thereby minimizing information leakage about the data items of interest. Unlike naive approaches where database portions are defined without considering privacy-maximizing subsets, we propose a technique for improving the privacy derivable from portions of the database used for answering user queries. In addition, our approach requires minimal user intervention and allows users to specify descriptions of their privacy preferences and cost tolerances along with their input queries to derive transformed queries capable of satisfying the input constrains when executed. We evaluated the system using patent data made available by the United States Patent and Trademark Office through Google Patent; however, the approach has a much wider application and the system developed can be adapted and deployed for use with many user-centric privacypreserving systems, thereby making access privacy obtainable for today’s Internet users.

[1]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[4]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[5]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  Jeffrey Scott Vitter,et al.  Supporting I/O-efficient scientific computation in TPIE , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[8]  Niv Gilboa,et al.  Computationally private information retrieval (extended abstract) , 1997, STOC '97.

[9]  Rafail Ostrovsky,et al.  Replication is not needed: single database, computationally-private information retrieval , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[10]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[11]  Moni Naor,et al.  Private Information Retrieval by Keywords , 1998, IACR Cryptol. ePrint Arch..

[12]  Alfred Kobsa,et al.  Tailoring Privacy to Users' Needs , 2001, User Modeling.

[13]  Sean W. Smith,et al.  Practical server privacy with secure coprocessors , 2001, IBM Syst. J..

[14]  Ari Juels,et al.  Targeted Advertising ... And Privacy Too , 2001, CT-RSA.

[15]  Bart Preneel,et al.  Towards Measuring Anonymity , 2002, Privacy Enhancing Technologies.

[16]  Johann-Christoph Freytag,et al.  Repudiative information retrieval , 2002, WPES '02.

[17]  George Danezis,et al.  Towards an Information Theoretic Metric for Anonymity , 2002, Privacy Enhancing Technologies.

[18]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[19]  Yuval Ishai,et al.  Reducing the Servers’ Computation in Private Information Retrieval: PIR with Preprocessing , 2004, Journal of Cryptology.

[20]  Dmitri Asonov Querying Databases Privately: A New Approach to Private Information Retrieval , 2004, Lecture Notes in Computer Science.

[21]  Rafail Ostrovsky,et al.  Batch codes and their applications , 2004, STOC '04.

[22]  田端 利宏,et al.  Network and Distributed System Security Symposiumにおける研究動向の調査 , 2004 .

[23]  Sean W. Smith,et al.  Protecting client privacy with trusted computing at the server , 2005, IEEE Security & Privacy Magazine.

[24]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[25]  D. Cheriton,et al.  Relational-Complete Private Information Retrieval , 2007 .

[26]  Ian Goldberg,et al.  Improving the Robustness of Private Information Retrieval , 2007 .

[27]  Amos Beimel,et al.  Robust Information-Theoretic Private Information Retrieval , 2002, Journal of Cryptology.

[28]  Philippe Gaborit,et al.  A Lattice-Based Computationally-Efficient Private Information Retrieval Protocol , 2007, IACR Cryptol. ePrint Arch..

[29]  Peter Williams,et al.  Usable PIR , 2008, NDSS.

[30]  Philippe Gaborit,et al.  High-Speed Private Information Retrieval Computation on GPU , 2008, 2008 Second International Conference on Emerging Security Information, Systems and Technologies.

[31]  Gabriel Ghinita Understanding the privacy-efficiency trade-off in location based queries , 2008, SPRINGL '08.

[32]  Jitender S. Deogun,et al.  The smart phones of tomorrow , 2008, SIGBED.

[33]  Philippe Gaborit,et al.  A fast private information retrieval protocol , 2008, 2008 IEEE International Symposium on Information Theory.

[34]  Panos Kalnis,et al.  Private queries in location based services: anonymizers are not necessary , 2008, SIGMOD Conference.

[35]  Xinwen Fu,et al.  CAP: A Context-Aware Privacy Protection System for Location-Based Services , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[36]  Josep Domingo-Ferrer,et al.  User-private information retrieval based on a peer-to-peer community , 2009, Data Knowl. Eng..

[37]  Martin Dietzfelbinger,et al.  Hash, Displace, and Compress , 2009, ESA.

[38]  Ian Goldberg,et al.  Achieving Efficient Query Privacy for Location Based Services , 2010, Privacy Enhancing Technologies.

[39]  Ian Goldberg,et al.  Privacy-Preserving Queries over Relational Databases , 2010, Privacy Enhancing Technologies.

[40]  Divyakant Agrawal,et al.  Generalizing PIR for Practical Private Retrieval of Public Data , 2010, DBSec.

[41]  Radu Sion,et al.  On securing untrusted clouds with cryptography , 2010, WPES '10.

[42]  Peishun Wang,et al.  Secure Coprocessor-based Private Information Retrieval without Periodical Preprocessing , 2010, AISC.

[43]  Ian Goldberg,et al.  Practical PIR for electronic commerce , 2011, CCS '11.

[44]  Ian Goldberg,et al.  Revisiting the Computational Practicality of Private Information Retrieval , 2011, Financial Cryptography.

[45]  胡滢 浅议Google Patent Search专利搜索 , 2012 .