Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage

The emerging use of the Internet for remote storage and backup has led to the problem of verifying that storage sites in a distributed system indeed store the data; this must often be done in the absence of knowledge of what the data should be. We use m/n erasure-correcting coding to safeguard the stored data and use algebraic signatures hash functions with algebraic properties for verification. Our scheme primarily utilizes one such algebraic property: taking a signature of parity gives the same result as taking the parity of the signatures. To make our scheme collusionresistant, we blind data and parity by XORing them with a pseudo-random stream. Our scheme has three advantages over existing techniques. First, it uses only small messages for verification, an attractive property in a P2P setting where the storing peers often only have a small upstream pipe. Second, it allows verification of challenges across random data without the need for the challenger to compare against the original data. Third, it is highly resistant to coordinated attempts to undetectably modify data. These signature techniques are very fast, running at tens to hundreds of megabytes per second. Because of these properties, the use of algebraic signatures will permit the construction of large-scale distributed storage systems in which large amounts of storage can be verified with minimal network bandwidth.

[1]  Witold Litwin,et al.  Algebraic signatures for scalable distributed data structures , 2004, Proceedings. 20th International Conference on Data Engineering.

[2]  Bruce M. Maggs,et al.  Globally Distributed Content Delivery , 2002, IEEE Internet Comput..

[3]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[4]  DruschelPeter,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001 .

[5]  Witold Litwin,et al.  LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes , 2000, SIGMOD '00.

[6]  A. Broder Some applications of Rabin’s fingerprinting method , 1993 .

[7]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[8]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[9]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[10]  Michael K. Reiter,et al.  Efficient Byzantine-tolerant erasure-coded storage , 2004, International Conference on Dependable Systems and Networks, 2004.

[11]  Ben Y. Zhao,et al.  Towards a Common API for Structured Peer-to-Peer Overlays , 2003, IPTPS.

[12]  Robert W. Bowdidge,et al.  Low cost comparisons of file copies , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[13]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[14]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[15]  Darrell D. E. Long,et al.  Strong Security for Network-Attached Storage , 2002, FAST.

[16]  Witold Litwin,et al.  LH*RS---a highly-available scalable distributed data structure , 2005, TODS.

[17]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[18]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[19]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[20]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[21]  Malcolm C. Harrison,et al.  Implementation of the substring test by hashing , 1971, CACM.

[22]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[23]  Banu Özden,et al.  StarFish: highly-available block storage , 2003, USENIX Annual Technical Conference, FREENIX Track.

[24]  Andrew V. Goldberg,et al.  Towards an archival Intermemory , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[25]  Mary Baker,et al.  The LOCKSS peer-to-peer digital preservation system , 2005, TOCS.

[26]  Thomas J. E. Schwarz Verification of Parity Data in Large Scale Storage Systems , 2004, PDPTA.

[27]  Michael Burrows,et al.  A Cooperative Internet Backup Scheme , 2003, USENIX Annual Technical Conference, General Track.

[28]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[29]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[30]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[31]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[32]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[33]  Marcel Waldvogel,et al.  Establishing trust in distributed storage providers , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[34]  Randy H. Katz,et al.  Coding techniques for handling failures in large disk arrays , 2005, Algorithmica.