Privacy-preserving Messaging and Search: A Collaborative Approach

Author(s): Fanti, Giulia Cecilia | Advisor(s): Ramchandran, Kannan | Abstract: In a free society, people have the right to consume and share public data without fear of retribution. However, today's technological landscape enables large-scale monitoring and censorship of networks by powerful entities (e.g., totalitarian governments); at worst, these entities may punish people for the information they consume or the opinions they espouse. This thesis considers two problems aimed at empowering people to freely share and consume public information: anonymous message spreading and privacy-preserving database search. In both areas, we present algorithmic innovations and analyze their correctness and efficiency. The key assumption underlying our work is that centralized architectures cannot reliably provide privacy-preserving services; not only are the incentive structures misaligned, but centralized infrastructures are often vulnerable to external breaches by hackers or government agencies, for example. We therefore restrict ourselves to distributed algorithms that rely on cooperation and resource-sharing between privacy-conscious individuals.In the area of anonymous message spreading, we consider a user who wishes to spread a message to as many people as possible over an underlying connectivity network (e.g., a social network); this is the premise of Yik Yak, Whisper, and other popular anonymous messaging networks. Most existing social networks (anonymous or not) use a push-based mechanism to spread content to all of a user's neighbors on the contact network; if a neighbor approves the content by `liking' it, this symmetric spreading propagates to the neighbor's neighbors, and so forth. Recent research suggests that under this spreading model, the true author of a message can be identified with non-negligible probability by a powerful global adversary. We propose an alternative, distributed spreading mechanism called adaptive diffusion, which breaks this symmetry. We show theoretically that adaptive diffusion gives optimal or asymptotically-optimal anonymity guarantees over certain classes of synthetic graphs for various adversarial models, while spreading nearly as fast as traditional symmetric mechanisms. On real-world graphs, we demonstrate empirically that adaptive diffusion gives significantly stronger anonymity properties than existing spreading mechanisms.In the area of privacy-preserving search, we consider the foundations of a distributed, privacy-preserving search engine built over public data. Architecturally, we envision a peer-to-peer (P2P) network in which each user stores a small piece of a public database; when a user wishes to search for something, she obtains it by requesting the information from her peer nodes, which execute a distributed search over the relevant data index. Critically, this operation should be privacy-preserving---that is, no peer node should learn anything about the contents of the user's query. Distributed search engines are not a new idea, but making such a service privacy-preserving and robust is algorithmically challenging. One challenge is that if the database is changing over time and the network is not centrally controlled, distributed users may not have a unified view of the database. That is, some peers may be storing an outdated portion of the global database, causing existing private search and retrieval algorithms to fail. We introduce a distributed private retrieval algorithm that is robust to servers with similar, but not identical, views of the database, and show that it incurs asymptotically negligible overhead compared to traditional algorithms. Another challenge is that most search engine users submit queries with multiple keywords, and expect the result to contain all of the queried keywords; this is known as a conjunctive query. Existing distributed algorithms return results that contain at least one of the queried keywords. This approach can incur significant communication overhead, as well as computational overhead for the client, who must sort through the results. We propose a new privacy-preserving search algorithm that processes conjunctive queries while incurring a communication cost that scales linearly in the number of documents that contain all the queried keywords. Our private-search algorithms build on principles from distributed source coding, which permit us to reduce the communication cost by exploiting correlations between the data of distributed peer nodes.

[1]  Chee Wei Tan,et al.  Rooting out the rumor culprit from suspects , 2013, 2013 IEEE International Symposium on Information Theory.

[2]  Martin Vetterli,et al.  Locating the Source of Diffusion in Large-Scale Networks , 2012, Physical review letters.

[3]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[4]  D. Shah,et al.  Finding Rumor Sources on Random Graphs , 2012 .

[5]  Robert Tappan Morris,et al.  Tarzan: a peer-to-peer anonymizing network layer , 2002, CCS '02.

[6]  Elaine Shi,et al.  Oblivious RAM with O((logN)3) Worst-Case Cost , 2011, ASIACRYPT.

[7]  Brent Waters,et al.  New constructions and practical applications for private stream searching , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[8]  Kannan Ramchandran,et al.  Private Stream Search at the same communication cost as a regular search: Role of LDPC codes , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[9]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[10]  Kannan Ramchandran,et al.  Distributed source coding using syndromes (DISCUS): design and construction , 2003, IEEE Trans. Inf. Theory.

[11]  Yuval Ishai,et al.  One-way functions are essential for single-server private information retrieval , 1999, STOC '99.

[12]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[13]  Wuqiong Luo,et al.  Identifying Infection Sources and Regions in Large Networks , 2012, IEEE Transactions on Signal Processing.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Shie Mannor,et al.  On identifying the causative network of an epidemic , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Yan-Cheng Chang,et al.  Single Database Private Information Retrieval with Logarithmic Communication , 2004, ACISP.

[17]  Srdjan Capkun,et al.  Attacks on physical-layer identification , 2010, WiSec '10.

[18]  Craig Gentry,et al.  Single-Database Private Information Retrieval with Constant Communication Rate , 2005, ICALP.

[19]  Andris Ambainis,et al.  On Lower Bounds for the Communication Complexity of Private Information Retrieval ∗ , 2000 .

[20]  Helen Nissenbaum,et al.  Trackmenot: Resisting Surveillance in Web Search , 2015 .

[21]  Femi G. Olumofin Practical Private Information Retrieval , 2011 .

[22]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[23]  Fred G. Gustavson,et al.  Analysis of the Berlekamp-Massey Linear Feedback Shift-Register Synthesis Algorithm , 1976, IBM J. Res. Dev..

[24]  Emin Gün Sirer,et al.  Herbivore: A Scalable and Efficient Protocol for Anonymous Communication , 2003 .

[25]  Birgit Pfitzmann,et al.  The Dining Cryptographers in the Disco - Underconditional Sender and Recipient Untraceability with Computationally Secure Serviceability (Abstract) , 1990, EUROCRYPT.

[26]  Devavrat Shah,et al.  Rumors in a Network: Who's the Culprit? , 2009, IEEE Transactions on Information Theory.

[27]  Shie Mannor,et al.  Detecting epidemics using highly noisy data , 2013, MobiHoc.

[28]  Ian Goldberg,et al.  Improving the Robustness of Private Information Retrieval , 2007 .

[29]  Ronald de Wolf,et al.  Improved Lower Bounds for Locally Decodable Codes and Private Information Retrieval , 2004, ICALP.

[30]  Zeev Dvir,et al.  2-Server PIR with Sub-Polynomial Communication , 2014, STOC.

[31]  Bruno de Finetti,et al.  Probability, induction and statistics , 1972 .

[32]  K. Ramchandran,et al.  Distributed source coding using syndromes (DISCUS): design and construction , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[33]  Madeline E. Smith,et al.  Limiting, leaving, and (re)lapsing: an exploration of facebook non-use practices and experiences , 2013, CHI.

[34]  Kannan Ramchandran,et al.  One extra bit of download ensures perfectly private information retrieval , 2014, 2014 IEEE International Symposium on Information Theory.

[35]  Rafail Ostrovsky,et al.  Batch codes and their applications , 2004, STOC '04.

[36]  Dirk Grunwald,et al.  Physical Layer Attacks on Unlinkability in Wireless LANs , 2009, Privacy Enhancing Technologies.

[37]  Elaine Shi,et al.  Towards Practical Oblivious RAM , 2011, NDSS.

[38]  Lei Ying,et al.  Locating Contagion Sources in Networks with Partial Timestamps , 2014, ArXiv.

[39]  Ian Goldberg,et al.  Optimally Robust Private Information Retrieval , 2012, USENIX Security Symposium.

[40]  Lei Ying,et al.  A robust information source estimator with sparse observations , 2014 .

[41]  Michael K. Reiter,et al.  Anonymous Web transactions with Crowds , 1999, CACM.

[42]  Radu Sion,et al.  On the Computational Practicality of Private Information Retrieval , 2006 .

[43]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[44]  Eitan Yaakobi,et al.  PIR with Low Storage Overhead: Coding instead of Replication , 2015, ArXiv.

[45]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[46]  David Lyon,et al.  Surveillance, Snowden, and Big Data: Capacities, consequences, critique , 2014, Big Data Soc..

[47]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[48]  Stefan Lindskog,et al.  How the Great Firewall of China is Blocking Tor , 2012, FOCI.

[49]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[50]  Sameer Pawar,et al.  PULSE: Peeling-based Ultra-Low complexity algorithms for Sparse signal Estimation , 2013 .

[51]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[52]  Yuval Ishai,et al.  Reducing the Servers Computation in Private Information Retrieval: PIR with Preprocessing , 2000, CRYPTO.

[53]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[54]  Larry M. Elison,et al.  Right of Privacy , 1987 .

[55]  Christos Faloutsos,et al.  Spotting Culprits in Epidemics: How Many and Which Ones? , 2012, 2012 IEEE 12th International Conference on Data Mining.

[56]  Jie Xu,et al.  Private information retrieval in the presence of malicious failures , 2002, Proceedings 26th Annual International Computer Software and Applications.

[57]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[58]  Bryan Ford,et al.  Dissent: accountable anonymous group messaging , 2010, CCS '10.

[59]  George Varghese,et al.  Biff (Bloom filter) codes: Fast error correction for large data sets , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[60]  Shie Mannor,et al.  Network forensics: random infection vs spreading epidemic , 2012, SIGMETRICS '12.

[61]  Rafail Ostrovsky,et al.  Private Searching on Streaming Data , 2005, Journal of Cryptology.

[62]  Patrick Valduriez,et al.  Protecting Data Privacy in Structured P2P Networks , 2009, Globe.

[63]  Dan Boneh,et al.  Riposte: An Anonymous Messaging System Handling Millions of Users , 2015, 2015 IEEE Symposium on Security and Privacy.

[64]  Ian Goldberg,et al.  The Best of Both Worlds: Combining Information-Theoretic and Computational PIR for Communication Efficiency , 2014, Privacy Enhancing Technologies.

[65]  C. Karlof,et al.  Secure routing in wireless sensor networks: attacks and countermeasures , 2003, Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications, 2003..

[66]  T. E. Harris,et al.  The Theory of Branching Processes. , 1963 .

[67]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[68]  Chandra Prakash,et al.  SybilInfer: Detecting Sybil Nodes using Social Networks , 2011 .

[69]  Yuval Ishai,et al.  General constructions for information-theoretic private information retrieval , 2005, J. Comput. Syst. Sci..

[70]  Ian Goldberg,et al.  Privacy-Preserving Queries over Relational Databases , 2010, Privacy Enhancing Technologies.

[71]  John O. Koehler Stasi: The Untold Story Of The East German Secret Police , 1999 .

[72]  Benny Pinkas,et al.  Oblivious RAM Revisited , 2010, CRYPTO.

[73]  Rafail Ostrovsky,et al.  A Survey of Single-Database Private Information Retrieval: Techniques and Applications , 2007, Public Key Cryptography.

[74]  Brent Waters,et al.  Attribute-based encryption for fine-grained access control of encrypted data , 2006, CCS '06.

[75]  Eyal Kushilevitz,et al.  Private information retrieval , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[76]  Benny Pinkas,et al.  Keyword Search and Oblivious Pseudorandom Functions , 2005, TCC.

[77]  Moni Naor,et al.  Private Information Retrieval by Keywords , 1998, IACR Cryptol. ePrint Arch..

[78]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[79]  Norman L. Johnson,et al.  Urn models and their application , 1977 .

[80]  Lenka Zdeborová,et al.  Inferring the origin of an epidemy with dynamic message-passing algorithm , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[81]  Rafail Ostrovsky,et al.  Public Key Encryption with Keyword Search , 2004, EUROCRYPT.

[82]  Vincenzo Fioriti,et al.  Predicting the sources of an outbreak with a spectral technique , 2012, ArXiv.

[83]  Ian Goldberg,et al.  Sublinear Scaling for Multi-Client Private Information Retrieval , 2015, Financial Cryptography.

[84]  Mihail N. Kolountzakis The Study of Translational Tiling with Fourier Analysis , 2004 .

[85]  Sergey Yekhanin,et al.  Towards 3-query locally decodable codes of subexponential length , 2008, JACM.

[86]  Kannan Ramchandran,et al.  Multi-server private information retrieval over unsynchronized databases , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[87]  Shie Mannor,et al.  Localized Epidemic Detection in Networks with Overwhelming Noise , 2014, SIGMETRICS.

[88]  E. Kushilevitz,et al.  Barrier for Information-Theoretic Private Information Retrieval , 2002 .

[89]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[90]  János Körner,et al.  How to encode the modulo-two sum of binary sources (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[91]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[92]  Kannan Ramchandran,et al.  Efficient Private Information Retrieval Over Unsynchronized Databases , 2015, IEEE Journal of Selected Topics in Signal Processing.

[93]  David Wolinsky,et al.  Dissent in Numbers: Making Strong Anonymity Scale , 2012, OSDI.

[94]  Pramod Viswanath,et al.  Spy vs. Spy , 2014, SIGMETRICS.

[95]  George Danezis,et al.  DP5: A Private Presence Service , 2015, Proc. Priv. Enhancing Technol..

[96]  Kannan Ramchandran,et al.  Hiding the Rumor Source , 2015, IEEE Transactions on Information Theory.

[97]  Hari Balakrishnan,et al.  CryptDB: protecting confidentiality with encrypted query processing , 2011, SOSP.

[98]  J. Rubenfeld The Right of Privacy , 1989 .

[99]  Femi George Olumon Practical Private Information Retrieval , 2011 .

[100]  David Chaum,et al.  The dining cryptographers problem: Unconditional sender and recipient untraceability , 1988, Journal of Cryptology.

[101]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[102]  Bradley J. Alge,et al.  Effects of computer surveillance on perceptions of privacy and procedural justice. , 2001, The Journal of applied psychology.

[103]  David A. Wagner,et al.  Secure routing in wireless sensor networks: attacks and countermeasures , 2003, Ad Hoc Networks.

[104]  Klim Efremenko,et al.  3-Query Locally Decodable Codes of Subexponential Length , 2008 .

[105]  Chee Wei Tan,et al.  Rumor source detection with multiple observations: fundamental limits and algorithms , 2014, SIGMETRICS '14.

[106]  Amos Beimel,et al.  Robust Information-Theoretic Private Information Retrieval , 2002, SCN.

[107]  Ari Juels,et al.  Dining Cryptographers Revisited , 2004, EUROCRYPT.