Secure outsourcing of sequence comparisons

Large-scale problems in the physical and life sciences are being revolutionized by Internet computing technologies, like grid computing, that make possible the massive cooperative sharing of computational power, bandwidth, storage, and data. A weak computational device, once connected to such a grid, is no longer limited by its slow speed, small amounts of local storage, and limited bandwidth: It can avail itself of the abundance of these resources that is available elsewhere on the network. An impediment to the use of “computational outsourcing” is that the data in question is often sensitive, e.g., of national security importance, or proprietary and containing commercial secrets, or to be kept private for legal requirements such as the HIPAA legislation, Gramm-Leach-Bliley, or similar laws. This motivates the design of techniques for computational outsourcing in a privacy-preserving manner, i.e., without revealing to the remote agents whose computational power is being used, either one's data or the outcome of the computation on the data. This paper investigates such secure outsourcing for widely applicable sequence comparison problems, and gives an efficient protocol for a customer to securely outsource sequence comparisons to two remote agents, such that the agents learn nothing about the customer's two private sequences or the result of the comparison. The local computations done by the customer are linear in the size of the sequences, and the computational cost and amount of communication done by the external agents are close to the time complexity of the best known algorithm for solving the problem on a single machine (i.e., quadratic, which is a huge computational burden for the kinds of massive data on which such comparisons are made). The sequence comparison problem considered arises in a large number of applications, including speech recognition, machine vision, and molecular sequence comparisons. In addition, essentially the same protocol can solve a larger class of problems whose standard dynamic programming solutions are similar in structure to the recurrence that subtends the sequence comparison algorithm.

[1]  Bruce Schneier,et al.  Applied cryptography : protocols, algorithms, and source codein C , 1996 .

[2]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[3]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[4]  Birgit Pfitzmann,et al.  Attacks on Protocols for Server-Aided RSA Computation , 1992, EUROCRYPT.

[5]  Alfred V. Aho,et al.  Optimal Code Generation for Expression Trees , 1976, J. ACM.

[6]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[7]  Kaoru Kurosawa,et al.  Oblivious keyword search , 2004, J. Complex..

[8]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[9]  Eugene H. Spafford,et al.  Secure outsourcing of scientific computations , 2001, Adv. Comput..

[10]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[11]  Ted Ashworth Review: The Grid – Blueprint for a New Computing Infrastructure , 1999 .

[12]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System (Awarded Best Student Paper!) , 2004 .

[13]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[14]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[15]  Peter H. Sellers,et al.  An Algorithm for the Distance Between Two Finite Sequences , 1974, J. Comb. Theory, Ser. A.

[16]  Hideki Imai,et al.  Speeding Up Secret Computations with Insecure Auxiliary Devices , 1988, CRYPTO.

[17]  Gad M. Landau,et al.  Introducing efficient parallelism into approximate string matching and a new serial algorithm , 1986, STOC '86.

[18]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[19]  Chae Hoon Lim,et al.  Security and Performance of Server-Aided RSA Computation Protocols , 1995, CRYPTO.

[20]  Mikhail J. Atallah,et al.  Secure outsourcing of sequence comparisons , 2005, International Journal of Information Security.

[21]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[22]  Rafail Ostrovsky,et al.  Public Key Encryption with Keyword Search , 2004, EUROCRYPT.

[23]  Bernard P. Zajac Applied cryptography: Protocols, algorithms, and source code in C , 1994 .

[24]  Wenliang Du,et al.  Protocols for Secure Remote Database Access with Approximate Matching , 2001, E-Commerce Security and Privacy.

[25]  Jean-Jacques Quisquater,et al.  Fast Server-Aided RSA Signatures Secure Against Active Attacks , 1995, CRYPTO.

[26]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[27]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[28]  Shin-ichi Kawamura,et al.  Fast Server-Aided Secret Computation Protocols for Modular Exponentiation , 1993, IEEE J. Sel. Areas Commun..

[29]  Jacques Stern,et al.  A new public key cryptosystem based on higher residues , 1998, CCS '98.

[30]  Tatsuaki Okamoto,et al.  A New Public-Key Cryptosystem as Secure as Factoring , 1998, EUROCRYPT.

[31]  Christian Cachin,et al.  Efficient private bidding and auctions with an oblivious third party , 1999, CCS '99.

[32]  Marc Fischlin,et al.  A Cost-Effective Pay-Per-Multiplication Comparison Method for Millionaires , 2001, CT-RSA.

[33]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[34]  Bruce Schneier,et al.  Applied cryptography (2nd ed.): protocols, algorithms, and source code in C , 1995 .

[35]  Chak-Kuen Wong,et al.  Bounds for the String Editing Problem , 1976, JACM.

[36]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[37]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[38]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[39]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.