Understanding and coping with failures in large-scale storage systems
暂无分享,去创建一个
[1] Andreas Haeberlen,et al. Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.
[2] Daniel P. Siewiorek,et al. Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.
[3] John H. Hartman,et al. The Zebra striped network file system , 1995, TOCS.
[4] Yasushi Saito,et al. Pangaea: a symbiotic wide-area file system , 2002, EW 10.
[5] Peter F. Corbett,et al. Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.
[6] Kishor S. Trivedi,et al. Markov Dependability Models of Complex Systems: Analysis Techniques , 1996 .
[7] Rodney Van Meter,et al. Network attached storage architecture , 2000, CACM.
[8] Ethan L. Miller,et al. Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[9] Antony I. T. Rowstron,et al. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.
[10] Witold Litwin,et al. High-availability LH* schemes with mirroring , 1996, Proceedings First IFCIS International Conference on Cooperative Information Systems.
[11] Srinivasan Seshan,et al. Performance and design evaluation of the RAID-II storage server , 2005, Distributed and Parallel Databases.
[12] Junfeng Yang,et al. An empirical study of operating systems errors , 2001, SOSP.
[13] Satoshi Matsuoka,et al. Performance analysis of scheduling and replication algorithms on Grid Datafarm architecture for high-energy physics applications , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.
[14] S. Shah,et al. Server class disk drives: how reliable are they? , 2004, Annual Symposium Reliability and Maintainability, 2004 - RAMS.
[15] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[16] Garth A. Gibson,et al. RAID: high-performance, reliable secondary storage , 1994, CSUR.
[17] Kanishk Jain. Object-based Storage , 2022 .
[18] Ian T. Foster,et al. Mapping the Gnutella Network , 2002, IEEE Internet Comput..
[19] Garth A. Gibson. Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis , 1990 .
[20] David R. Karger,et al. Wide-area cooperative storage with CFS , 2001, SOSP.
[21] Thomas J. Glover,et al. Pocket PCRef , 1991 .
[22] Walter A. Burkhard,et al. Disk array storage system reliability , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[23] David V. Anderson. Object based storage devices: a command set proposal , 1999 .
[24] Niraj K. Jha,et al. Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.
[25] José Duato. A Theory of Fault-Tolerant Routing in Wormhole Networks , 1997, IEEE Trans. Parallel Distributed Syst..
[26] John C. S. Lui,et al. Performance Analysis of Disk Arrays under Failure , 1990, VLDB.
[27] J. G. Elerath. Specifying reliability in the disk drive industry: No more MTBF's , 2000, Annual Reliability and Maintainability Symposium. 2000 Proceedings. International Symposium on Product Quality and Integrity (Cat. No.00CH37055).
[28] Chandramohan A. Thekkath,et al. Petal: distributed virtual disks , 1996, ASPLOS VII.
[29] Darrell D. E. Long,et al. Exploiting Multiple I/O Streams to Provide High Data-Rates , 1991, USENIX Summer.
[30] Witold Litwin,et al. Algebraic signatures for scalable distributed data structures , 2004, Proceedings. 20th International Conference on Data Engineering.
[31] Tore Risch,et al. LH* Schemes with Scalable Availability , 1998 .
[32] Boris Vladimirovič Gnedenko,et al. Mathematical methods in the reliability theory , 1969 .
[33] David A. Patterson,et al. Embracing Failure: A Case for Recovery-Oriented Computing (ROC) , 2001 .
[34] Nitin H. Vaidya,et al. A case for two-level distributed recovery schemes , 1995, SIGMETRICS '95/PERFORMANCE '95.
[35] Ben Y. Zhao,et al. Maintenance-Free Global Data Storage , 2001, IEEE Internet Comput..
[36] J. Menon,et al. Distributed sparing in disk arrays , 1992, Digest of Papers COMPCON Spring 1992.
[37] Randy H. Katz,et al. RAMA: a file system for massively-parallel computers , 1993, [1993] Proceedings Twelfth IEEE Symposium on Mass Storage systems.
[38] Ian Clarke,et al. Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.
[39] Miguel Castro,et al. Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.
[40] Spencer W. Ng. Crosshatch disk array for improved reliability and performance , 1994, ISCA '94.
[41] Magnus Karlsson,et al. Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.
[42] Darrell D. E. Long,et al. Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..
[43] Roger Wattenhofer,et al. Large-scale simulation of replica placement algorithms for a serverless distributed file system , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[44] Thomas J. E. Schwarz. Reed Solomon codes for Erasure Correction in SDDS , 2002, WDAS.
[45] Thomas E. Anderson,et al. xFS: a wide area mass storage file system , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.
[46] Miguel Castro,et al. Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.
[47] E. L. Miller,et al. Efficient Metadata Management in Large Distributed File Systems , .
[48] Sung Hoon Baek,et al. Reliability and performance of hierarchical RAID with multiple controllers , 2001, PODC '01.
[49] John Wilkes,et al. Seneca: remote mirroring done write , 2003, USENIX Annual Technical Conference, General Track.
[50] J. G. Elerath,et al. Disk drive reliability case study: dependence upon head fly-height and quantity of heads , 2003, Annual Reliability and Maintainability Symposium, 2003..
[51] Roger Wattenhofer,et al. Optimizing file availability in a secure serverless distributed file system , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.
[52] John I. McCool,et al. Probability and Statistics With Reliability, Queuing and Computer Science Applications , 2003, Technometrics.
[53] James S. Plank,et al. A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..
[54] Charles L. Seitz,et al. Multicomputers: message-passing concurrent computers , 1988, Computer.
[55] Noah Treuhaft,et al. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies , 2002 .
[56] Arif Merchant,et al. FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.
[57] Sharon E. Perl,et al. Myriad: Cost-Effective Disaster Tolerance , 2002, FAST.
[58] Feng Wang,et al. File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .
[59] Xiang Yu,et al. Configuring and Scheduling an Eager-Writing Disk Array for a Transaction Processing Workload , 2002, FAST.
[60] Ben Y. Zhao,et al. OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.
[61] C. Mohan,et al. Recovery and Coherency-Control Protocols for Fast Intersystem Page Transfer and Fine-Granularity Locking in a Shared Disks Transaction Environment , 1991, VLDB.
[62] Darrell D. E. Long. A technique for managing mirrored disks , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).
[63] John R. Douceur,et al. The Sybil Attack , 2002, IPTPS.
[64] Kishor S. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .
[65] Carl Staelin,et al. Idleness is Not Sloth , 1995, USENIX.
[66] S. Shah,et al. Disk drive vintage and its effect on reliability , 2004, Annual Symposium Reliability and Maintainability, 2004 - RAMS.
[67] Matthew T. O'Keefe,et al. Scalability and Failure Recovery in a Linux Cluster File System , 2000, Annual Linux Showcase & Conference.
[68] Scott A. Brandt,et al. Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[69] G. A. Alvarez,et al. Tolerating Multiple Failures In Raid Architectures With Optimal Storage And Uniform Declustering , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[70] Chandramohan A. Thekkath,et al. Frangipani: a scalable distributed file system , 1997, SOSP.
[71] Roger Wattenhofer,et al. Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System , 2001, DISC.
[72] Spencer W. Ng,et al. Disk scrubbing in large archival storage systems , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..
[73] Garth Goodson,et al. Efficient, Scalable Consistency for Highly Fault-tolerant Storage (CMU-PDL-04-111) , 2004 .
[74] Dror G. Feitelson,et al. The Vesta parallel file system , 1996, TOCS.
[75] Jehoshua Bruck,et al. EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.
[76] Erik Riedel,et al. More Than an Interface - SCSI vs. ATA , 2003, FAST.
[77] Randy H. Katz,et al. How reliable is a RAID? , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.
[78] GhemawatSanjay,et al. The Google file system , 2003 .
[79] Randy H. Katz,et al. RAMA: An Easy-to-Use, High-Performance Parallel File System , 1997, Parallel Comput..
[80] 天野 英晴. J. L. Hennessy and D. A. Patterson: Computer Architecture: A Quantitative Approach, Morgan Kaufmann (1990)(20世紀の名著名論) , 2003 .
[81] Li Zhang,et al. Fault tolerant networks with small degree , 2000, SPAA '00.
[82] Randy H. Katz,et al. Coding techniques for handling failures in large disk arrays , 2005, Algorithmica.
[83] Miguel Oom Temudo de Castro,et al. Practical Byzantine fault tolerance , 1999, OSDI '99.
[84] Edward Grochowski,et al. Technological impact of magnetic hard disk drives on storage systems , 2003, IBM Syst. J..
[85] Chita R. Das,et al. A Testbed for Evaluation of Fault-Tolerant Routing in Multiprocessor Interconnection Networks , 1999, IEEE Trans. Parallel Distributed Syst..
[86] Daniel P. Siewiorek,et al. Reliable computer systems (2nd ed.): design and evaluation , 1992 .
[87] Yale N. Patt,et al. Disk subsystem load balancing: disk striping vs. conventional data placement , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.
[88] Michael Stonebraker,et al. Distributed RAID-a new multiple copy algorithm , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.
[89] Hai Jin,et al. RAID-x: a new distributed disk array for I/O-centric cluster computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.
[90] Tak-Shing Peter Yum,et al. Dynamic Multiple Parity (DMP) Disk Array for Serial Transaction Processing , 2001, IEEE Trans. Computers.
[91] Jim Zelenka,et al. File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.
[92] Ethan L. Miller,et al. Interconnection Architectures for Petabyte-Scale High-Performance Storage Systems , 2004 .
[93] Robert S. Swarz,et al. Reliable Computer Systems: Design and Evaluation , 1992 .
[94] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.
[95] Jim Zelenka,et al. A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.
[96] Yale N. Patt,et al. Using non-volatile storage to improve the reliability of RAID5 disk arrays , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[97] Mike Loukides,et al. Using SANs and NAS , 2002 .
[98] Kishor S. Trivedi,et al. Reliabilities of two fault-tolerant interconnection networks , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[99] Witold Litwin,et al. LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes , 2000, SIGMOD '00.
[100] Kishor S. Trivedi,et al. FSPNs: Fluid Stochastic Petri Nets , 1993, Application and Theory of Petri Nets.
[101] Witold Litwin,et al. LH*s: a high-availability and high-security scalable distributed data structure , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.
[102] Walter A. Burkhard,et al. Reliability and performance of RAIDs , 1995 .
[103] Aaron Brown. Accepting Failure: Availability through Repair-centric System Design , 2001 .
[104] Amin Vahdat,et al. Interposed request routing for scalable network storage , 2000, TOCS.
[105] John Kubiatowicz,et al. Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.
[106] Prasant Mohapatra,et al. Wormhole routing techniques for directly connected multicomputer systems , 1998, CSUR.
[107] Joseph F. Murray,et al. Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..
[108] H. Apte,et al. Serverless Network File Systems , 2006 .
[109] Gustavo Alonso,et al. Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.
[110] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.
[111] David A. Patterson,et al. An Analysis of Error Behaviour in a Large Storage System , 1999 .
[112] Ben Y. Zhao,et al. Pond: The OceanStore Prototype , 2003, FAST.