In-place reconstruction of delta compressed files

results in high latency and low bandwidth to web-enabled clients and prevents the timely delivery of software. We present an algorithm for modifying delta compressed files so that the compressed versions may be reconstructed without scratch space. This allows network clients with limited resources to efficiently update software by retrieving delta compressed versions over a network. Delta compression for binary files, compactly encoding a version of data with only the changed bytes from a previous version, may be used to efficiently distribute software over low bandwidth channels, such as the Internet. Traditional methods for rebuilding these delta files require memory or storage space on the target machine for both the old and new version of the file to be reconstructed. With the advent of network computing and Internet-enabled devices, many of these network attached target machines have limited additional scratch space. We present an algorithm for modifying a delta compressed version file so that it may rebuild the new me version in the space that the current version occupies. Differential or delta compression [5, 11, compactly encoding a new version of a file using only the changed bytes from a previous version, can be used to reduce the size of the file to be transmitted and consequently the time to perform software update. Currently, decompressing delta encoded files requires scratch space, additional disk or memory storage, used to hold a required second copy of the file. Two copies of the compressed file must be concurrently available, as the delta file contains directives to read data from the old file version while the new file version is being materialized in another region of storage. This presents a problem. Network attached devices often have limited memory resources and no disks and therefore are not capable of storing two file versions at the same time. Furthermore, adding storage to network attached devices is not viable, as keeping these devices simple limits their production costs.

[1]  Paul N. Hilfinger,et al.  PRCS: The Project Revision Control System , 1998, SCM.

[2]  Joseph Naor,et al.  Approximating Minimum Feedback Sets and Multi-Cuts in Directed Graphs , 1995, IPCO.

[3]  F TichyWalter The string-to-string correction problem with block moves , 1984 .

[4]  Christoph Reichenberger,et al.  Delta storage for arbitrary non-text files , 1991, SCM '91.

[5]  Walter F. Tichy,et al.  Delta algorithms: an empirical analysis , 1998, TSEM.

[6]  Guy M. Lohman,et al.  Differential files: their application to the maintenance of large databases , 1976, TODS.

[7]  Christopher W. Fraser,et al.  An editor for revision control , 1987, TOPL.

[8]  Fred Douglis,et al.  Optimistic deltas for WWW latency reduction , 1997 .

[9]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[10]  Randal C. Burns,et al.  Efficient distributed backup with delta compression , 1997, IOPADS '97.

[11]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[12]  Donald E. Knuth,et al.  fundamental algorithms , 1969 .

[13]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[14]  Eugene W. Myers,et al.  A file comparison program , 1985, Softw. Pract. Exp..

[15]  Andrew P. Black,et al.  A compact representation for file versions: a preliminary report , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[16]  Bruce A. Reed,et al.  Packing directed circuits , 1996, Comb..

[17]  Darrell D. E. Long,et al.  A linear time, constant space differencing algorithm , 1997, 1997 IEEE International Performance, Computing and Communications Conference.

[18]  Mun Choon Chan,et al.  Cache-based compaction: a new technique for optimizing Web transfer , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[19]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[20]  Paul D. Seymour,et al.  Packing directed circuits fractionally , 1995, Comb..

[21]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.

[22]  Joseph Naor,et al.  Approximating Minimum Feedback Sets and Multicuts in Directed Graphs , 1998, Algorithmica.

[23]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[24]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[25]  Walter F. Tichy,et al.  Rcs — a system for version control , 1985, Softw. Pract. Exp..

[26]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[27]  Marc J. Rochkind,et al.  The source code control system , 1975, IEEE Transactions on Software Engineering.

[28]  Walter F. Tichy,et al.  The string-to-string correction problem with block moves , 1984, TOCS.

[29]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[30]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[31]  Walter F. Tichy,et al.  An Empirical Study of Delta Algorithms , 1996, SCM.