Efficient archival data storage
暂无分享,去创建一个
[1] Larry Carter,et al. Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.
[2] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[3] Walter F. Tichy,et al. The string-to-string correction problem with block moves , 1984, TOCS.
[4] Ronald Fagin,et al. Compactly encoding unstructured inputs with differential compression , 2002, JACM.
[5] Witold Litwin,et al. LH* - Linear Hashing for Distributed Files , 1993, SIGMOD Conference.
[6] Mark Nelson,et al. The Data Compression Book , 2009 .
[7] Alexander S. Szalay,et al. TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange , 2002, ArXiv.
[8] Hugh E. Williams,et al. A general-purpose compression scheme for large collections , 2002, TOIS.
[9] Torsten Suel,et al. Compressing File Collections with a TSP-Based Approach , 2004 .
[10] J. W. Hunt,et al. An Algorithm for Differential File Comparison , 2008 .
[11] Witold Litwin,et al. Algebraic signatures for scalable distributed data structures , 2004, Proceedings. 20th International Conference on Data Engineering.
[12] Jeff Rothenberg,et al. Ensuring the Longevity of Digital Documents , 1995 .
[13] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[14] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2003, WWW '03.
[15] David Wetherall,et al. A protocol-independent technique for eliminating redundant network traffic , 2000, SIGCOMM.
[16] Darrell D. E. Long,et al. Duplicate Data Elimination in a SAN File System , 2004, MSST.
[17] Randal C. Burns,et al. In-place reconstruction of delta compressed files , 1998, PODC '98.
[18] Mendel Rosenblum,et al. The design and implementation of a log-structured file system , 1991, SOSP '91.
[19] Timo Burkard,et al. Herodotus: A Peer-to-Peer Web Archival System , 2002 .
[20] Norman C. Hutchinson,et al. Deciding when to forget in the Elephant file system , 1999, SOSP.
[21] Antony I. T. Rowstron,et al. Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.
[22] Hai Jin,et al. Disk System Architectures for High Performance Computing , 2002 .
[23] Elizabeth R. Jessup,et al. Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..
[24] Andrew Tridgell,et al. Efficient Algorithms for Sorting and Synchronization , 1999 .
[25] Andrew V. Goldberg,et al. A prototype implementation of archival Intermemory , 1999, DL '99.
[26] Yasushi Saito,et al. Pangaea: a symbiotic wide-area file system , 2002, EW 10.
[27] Fazli Can,et al. Incremental clustering for dynamic information processing , 1993, TOIS.
[28] David R. Karger,et al. Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.
[29] Ethan L. Miller,et al. Long-term File Activity and Inter-Reference Patterns (CMG Paper # 2041) , 1998 .
[30] Ben Y. Zhao,et al. OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.
[31] Hector Garcia-Molina,et al. Building a scalable and accurate copy detection mechanism , 1996, DL '96.
[32] Witold Litwin,et al. LH*—a scalable, distributed data structure , 1996, TODS.
[33] Ronald L. Rivest,et al. The MD4 Message-Digest Algorithm , 1990, RFC.
[34] James Lau,et al. File System Design for an NFS File Server Appliance , 1994, USENIX Winter.
[35] Torsten Suel,et al. zdelta: An efficient delta compression tool , 2002 .
[36] Peter F. Corbett,et al. Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction , 2004 .
[37] Randal C. Burns. DIFFERENTIAL COMPRESSION: A GENERALIZED SOLUTION FOR BINARY FILES , 1996 .
[38] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[39] Gregory R. Ganger,et al. Ursa minor: versatile cluster-based storage , 2005, FAST'05.
[40] Antony I. T. Rowstron,et al. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.
[41] Val Henson,et al. An Analysis of Compare-by-hash , 2003, HotOS.
[42] Ekow J. Otoo,et al. Balanced multidimensional extendible hash tree , 1985, PODS.
[43] Witold Litwin,et al. High-availability LH* schemes with mirroring , 1996, Proceedings First IFCIS International Conference on Cooperative Information Systems.
[44] Udi Manber,et al. Integrating content-based access mechanisms with hierarchical file systems , 1999, OSDI '99.
[45] Ben Y. Zhao,et al. Silverback: A Global-Scale Archival System , 2001 .
[46] Randal C. Burns,et al. Efficient distributed backup with delta compression , 1997, IOPADS '97.
[47] Christos T. Karamanolis,et al. Evaluation of Efficient Archival Storage Techniques , 2004, MSST.
[48] Joshua P. MacDonald,et al. File System Support for Delta Compression , 2000 .
[49] Darrell D. E. Long,et al. Design and Implementation of a Predictive File Prefetching Algorithm , 2001, USENIX Annual Technical Conference, General Track.
[50] Kai Li,et al. Image similarity search with compact data structures , 2004, CIKM '04.
[51] Herbert Bos,et al. File size distribution on UNIX systems: then and now , 2006, OPSR.
[52] C. M. Riggle,et al. Design of error correction systems for disk drives , 1998 .
[53] Nasir D. Memon,et al. Cluster-based delta compression of a collection of files , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..
[54] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.
[55] Ethan L. Miller,et al. Long-term unix file system activity and the efficacy of automatic file migration , 1998 .
[56] Darrell D. E. Long,et al. Deep Store: an archival storage system architecture , 2005, 21st International Conference on Data Engineering (ICDE'05).
[57] Chaitanya K. Baru,et al. Collection-Based Persistent Digital Archives - Part 2 , 2000, D Lib Mag..
[58] John Kubiatowicz,et al. Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.
[59] Margo I. Seltzer,et al. A New Hashing Package for UNIX , 1991, USENIX Winter.
[60] Chaitanya K. Baru,et al. Collection-Based Persistent Digital Archives - Part 1 , 2000, D Lib Mag..
[61] Walter A. Burkhard,et al. Some approaches to best-match file searching , 1973, Commun. ACM.
[62] Ronald L. Rivest,et al. The MD5 Message-Digest Algorithm , 1992, RFC.
[63] Christoph Reichenberger,et al. Delta storage for arbitrary non-text files , 1991, SCM '91.
[64] Walter F. Tichy,et al. Delta algorithms: an empirical analysis , 1998, TSEM.
[65] Fred Douglis,et al. Redundancy Elimination Within Large Collections of Files , 2004, USENIX Annual Technical Conference, General Track.
[66] Elwyn R. Berlekamp,et al. Algebraic coding theory , 1984, McGraw-Hill series in systems science.
[67] M. Narasimha Murty,et al. A computationally efficient technique for data-clustering , 1980, Pattern Recognit..
[68] Prashant J. Shenoy,et al. Rules of thumb in data engineering , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[69] Darren R. Hardy,et al. Essence: A Resource Discovery System Based on Semantic File Indexing , 1993, USENIX Winter.
[70] Udi Manber,et al. GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.
[71] Garth A. Gibson,et al. RAID: high-performance, reliable secondary storage , 1994, CSUR.
[72] Éric Fimbel. Edit distance and chaitin-kolmogorov difference , 2002 .
[73] Miguel Castro,et al. Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.
[74] Hector Garcia-Molina,et al. Archival storage for digital libraries , 1998, DL '98.
[75] Fred Douglis,et al. USENIX Association Proceedings of the General Track : 2003 USENIX Annual , 2003 .
[76] Magnus Karlsson,et al. Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.
[77] Darrell D. E. Long,et al. Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..
[78] Margo I. Seltzer,et al. Structure and Performance of the Direct Access File System , 2002, USENIX ATC, General Track.
[79] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[80] H. Samet,et al. Incremental Similarity Search in Multimedia Databases , 2000 .
[81] Anne E. Trefethen,et al. The Data Deluge: An e-Science Perspective , 2003 .
[82] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .
[83] Darrell D. E. Long,et al. A linear time, constant space differencing algorithm , 1997, 1997 IEEE International Performance, Computing and Communications Conference.
[84] Richard M. Karp,et al. Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..
[85] Colin Percival. Naı̈ve Differences of Executable Code , 2003 .
[86] Craig A. N. Soules,et al. Connections: using context to enhance file search , 2005, SOSP '05.
[87] Chandramohan A. Thekkath,et al. Frangipani: a scalable distributed file system , 1997, SOSP.
[88] Daniel J. Rosenkrantz,et al. A linear-time scheme for version reconstruction , 1994, TOPL.
[89] Michael O. Rabin,et al. Probabilistic Algorithms in Finite Fields , 1980, SIAM J. Comput..
[90] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[91] Arkady B. Zaslavsky,et al. Signature Extraction for Overlap Detection in Documents , 2002, ACSC.
[92] Andrew V. Goldberg,et al. Towards an archival Intermemory , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.
[93] Zhichen Xu,et al. Towards a semantic, deep archival file system , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..
[94] P. Sarbanes,et al. Sarbanes-Oxley Act of 2002 , 2002 .
[95] Aris M. Ouksel,et al. Storage mappings for multidimensional linear dynamic hashing , 1983, PODS.
[96] Krishna Bharat,et al. The Term Vector Database: fast access to indexing terms for Web pages , 2000, Comput. Networks.
[97] Walter F. Tichy,et al. Rcs — a system for version control , 1985, Softw. Pract. Exp..
[98] Timothy L. Harris,et al. Storage, Mutability and Naming in Pasta , 2002, NETWORKING Workshops.
[99] Rajeev Motwani,et al. Incremental clustering and dynamic information retrieval , 1997, STOC '97.
[100] Dan Klein,et al. Evaluating strategies for similarity search on the web , 2002, WWW '02.
[101] Peter K. Pearson,et al. Fast hashing of variable-length text strings , 1990, CACM.
[102] Michael O. Rabin,et al. Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.
[103] Jeannette M. Wing,et al. Verifiable secret redistribution for archive systems , 2002, First International IEEE Security in Storage Workshop, 2002. Proceedings..
[104] Reagan Moore,et al. Configuring and tuning archival storage systems , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).
[105] GhemawatSanjay,et al. The Google file system , 2003 .
[106] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.
[107] David Eppstein,et al. Fast hierarchical clustering and other applications of dynamic closest pairs , 1999, SODA '98.
[108] Chandramohan A. Thekkath,et al. Petal: distributed virtual disks , 1996, ASPLOS VII.
[109] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.
[110] Anja Feldmann,et al. Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.
[111] John T. Kohl,et al. HighLight: Using a Log-structured File System for Tertiary Storage Management , 1993, USENIX Winter.
[112] Kave Eshghi. Intrinsic references in distributed systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.
[113] Ian Pratt,et al. Proceedings of the General Track: 2004 USENIX Annual Technical Conference , 2004 .
[114] Zhichen Xu,et al. PeerSearch: Efficient Information Retrieval in Peer-to-Peer Networks , 2002 .
[115] David G. Korn,et al. Engineering a Differencing and Compression Data Format , 2002, USENIX Annual Technical Conference, General Track.
[116] Brian D. Noble,et al. Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .
[117] Khalid Sayood. Lossless Compression Handbook , 2003 .
[118] Pierre Jouvelot,et al. Semantic file systems , 1991, SOSP '91.
[119] Darrell D. E. Long,et al. Experimentally Evaluating In-Place Delta Reconstruction , 2002 .
[120] Ronitt Rubinfeld,et al. A sublinear algorithm for weakly approximating edit distance , 2003, STOC '03.
[121] Keishi Tajima,et al. Archiving scientific data , 2004, TODS.
[122] D. J. Wheeler,et al. A Block-sorting Lossless Data Compression Algorithm , 1994 .
[123] Dengguo Feng,et al. Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD , 2004, IACR Cryptol. ePrint Arch..
[124] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[125] Scott A. Brandt,et al. Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..
[126] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.
[127] Jeffrey C. Mogul,et al. The VCDIFF Generic Differencing and Compression Data Format , 2002, RFC.
[128] A. Broder. Some applications of Rabin’s fingerprinting method , 1993 .
[129] Dan Suciu,et al. XMill: an efficient compressor for XML data , 2000, SIGMOD '00.
[130] Richard N. Tucker. THE DOMESDAY PROJECT , 1989 .
[131] Monika Henzinger,et al. Finding Related Pages in the World Wide Web , 1999, Comput. Networks.
[132] Sean Quinlan,et al. Venti: A New Approach to Archival Storage , 2002, FAST.
[133] Michael A. Olson,et al. The Design and Implementation of the Inversion File System , 1993, USENIX Winter.
[134] Ethan L. Miller,et al. Using content-derived names for configuration management , 1997, SSR '97.
[135] Hector Garcia-Molina,et al. Finding near-replicas of documents on the Web , 1999 .