Availability, fairness, and performance optimization in storage virtualization systems

Storage consolidation arises due to the increasing demand for storage of massive capacity. Storage virtualization is an effective way of managing consolidated storage resources. This dissertation presents novel algorithms to resolve two critical issues in QoS guarantees in storage virtualization systems: availability guarantee and fairness guarantee. This dissertation also puts forward a unique network-centric buffer cache organization that can successfully eliminate several major overheads in data transmission within certain storage clients. The availability guarantee is supported by using replication and by translating availability requirement to conventional resource requirement. A measurement-based admission control (MBAC) algorithm is proposed that can effectively reduce the resource requirement of a virtual disk with availability guarantee to what it actually needs. The fairness guarantee is addressed in two aspects. First this dissertation proposes a new real-time disk scheduler that not only call provide normal QoS guarantees and achieve great disk utilization efficiency, but also can ensure both long-term and short term fairness among competing virtual disks. Further for the first time we address the fairness issues introduced by additional disk movement overhead for switching between virtual disks. This dissertation proposes a novel buffer cache organization to optimize the performance of storage clients, usually application servers. This cache organization call eliminate the data copying operations associated with data transportation in a special type of application servers, pass-through servers, whose main responsibility is to channel data between different external entities. Moreover, this cache organization is designed to be friendly to legacy application servers.

[1]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[2]  Guru M. Parulkar,et al.  Axon: a high speed communication architecture for distributed applications , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[3]  Harrick M. Vin,et al.  A statistical admission control algorithm for multimedia servers , 1994, MULTIMEDIA '94.

[4]  Ravi Wijayaratne,et al.  Integrated QOS management for disk I/O , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[5]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[6]  ZhangLixia,et al.  A measurement-based admission control algorithm for integrated services packet networks , 1995 .

[7]  Gang Peng,et al.  Performance guarantees for cluster-based internet services , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[8]  Jeffrey C. Mogul,et al.  TCP Offload Is a Dumb Idea Whose Time Has Come , 2003, HotOS.

[9]  Klara Nahrstedt,et al.  Multimedia: Computing, Communications and Applications , 1994 .

[10]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[11]  Yuanyuan Zhou,et al.  Experiences with VI communication for database storage , 2002, ISCA.

[12]  Alan Jay Smith,et al.  Dynamic locality improvement techniques for increasing effective storage performance , 2002 .

[13]  Julian Satran,et al.  Internet Small Computer Systems Interface (iSCSI) , 2004, RFC.

[14]  Lixia Zhang VirtualClock: A New Traffic Control Algorithm for Packet-Switched Networks , 1991, ACM Trans. Comput. Syst..

[15]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[16]  Margo I. Seltzer,et al.  Structure and Performance of the Direct Access File System , 2002, USENIX ATC, General Track.

[17]  Julie Ward,et al.  Appia: Automatic Storage Area Network Fabric Design , 2002, FAST.

[18]  Dirk Beyer,et al.  Designing for Disasters , 2004, FAST.

[19]  Tzi-cker Chiueh,et al.  Performance guarantee for cluster-based Internet services , 2002, Ninth International Conference on Parallel and Distributed Systems, 2002. Proceedings..

[20]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[21]  Gang Peng,et al.  Multi-dimensional storage virtualization , 2004, SIGMETRICS '04/Performance '04.

[22]  Tzi-cker Chiueh,et al.  Network-Wide Load Balancing Routing With Performance Guarantees , 2006, 2006 IEEE International Conference on Communications.

[23]  Gang Peng,et al.  Statistical admission control using delay distribution measurements , 2006, TOMCCAP.

[24]  Gang Peng,et al.  Network-Centric Buffer Cache Organization , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[25]  Banu Özden,et al.  Disk scheduling with quality of service guarantees , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[26]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the multiple node case , 1993, IEEE INFOCOM '93 The Conference on Computer Communications, Proceedings.

[27]  Harrick M. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM 1996.

[28]  Tzi-cker Chiueh,et al.  Efficient provisioning algorithms for network resource virtualization with qos guarantees , 2003 .

[29]  William I. Nowicki,et al.  NFS: Network File System Protocol specification , 1989, RFC.

[30]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks: the single-node case , 1993, TNET.

[31]  Larry L. Peterson,et al.  Making paths explicit in the Scout operating system , 1996, OSDI '96.

[32]  Gang Peng,et al.  A case for network-centric buffer cache organization , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[33]  Jeffrey S. Chase,et al.  Payload Caching: High-Speed Data Forwarding for Network Intermediaries , 2001, USENIX ATC, General Track.

[34]  Kartik Gopalan,et al.  Real-Time Disk Scheduling Using Deadline Sensitive SCAN , 2001 .

[35]  Prashant J. Shenoy,et al.  Cello: A Disk Scheduling Framework for Bext Generation Operating Systems , 1998, SIGMETRICS.

[36]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[37]  Tzi-cker Chiueh,et al.  TBBT: scalable and accurate trace replay for file server evaluation , 2005, SIGMETRICS '05.

[38]  Hui Zhang,et al.  Service disciplines for guaranteed performance service in packet-switching networks , 1995, Proc. IEEE.

[39]  Erik Riedel,et al.  More Than an Interface - SCSI vs. ATA , 2003, FAST.

[40]  Kaladhar Voruganti,et al.  USENIX Association Proceedings of FAST ’ 03 : 2 nd USENIX Conference on File and Storage Technologies , 2003 .

[41]  Greg Lehey The Vinum Volume Manager , 1999, USENIX Annual Technical Conference, FREENIX Track.

[42]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.

[43]  Jim Zelenka,et al.  File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.

[44]  Larry L. Peterson,et al.  Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[45]  Tzi-cker Chiueh,et al.  DDM : Statistical Admission Control Using Delay Distribution Measurement , 2003 .

[46]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[47]  Eric Anderson,et al.  Selecting RAID Levels for Disk Arrays , 2002, FAST.

[48]  Dinesh C. Verma,et al.  A Scheme for Real-Time Channel Establishment in Wide-Area Networks , 1990, IEEE J. Sel. Areas Commun..

[49]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[50]  Tzi-cker Chiueh,et al.  Delay budget partitioning to maximize network resource usage efficiency , 2004, IEEE INFOCOM 2004.

[51]  Harrick M. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM '96.

[52]  Lan Huang,et al.  Stonehenge: a high-performance virtualized ip storage cluster with qos guarantees , 2003 .

[53]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[54]  Peter Druschel,et al.  Differentiated and predictable quality of service in web server systems , 2001 .

[55]  Wei Jin,et al.  Interposed proportional sharing for a storage service utility , 2004, SIGMETRICS '04/Performance '04.

[56]  Yannis Smaragdakis,et al.  EELRU: simple and effective adaptive page replacement , 1999, SIGMETRICS '99.

[57]  G. A. Alvarez,et al.  Tolerating Multiple Failures In Raid Architectures With Optimal Storage And Uniform Declustering , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[58]  Gregory R. Ganger,et al.  Automated Disk Drive Characterization , 1999 .

[59]  Pradeep K. Khosla,et al.  Selecting the Right Data Distribution Scheme for a Survivable Storage System (CMU-CS-01-120) , 2001 .

[60]  Donald F. Towsley,et al.  Performance evaluation of two new disk scheduling algorithms for real-time systems , 2004, Real-Time Systems.

[61]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[62]  Y. Toyoda A Simplified Algorithm for Obtaining Approximate Solutions to Zero-One Programming Problems , 1975 .

[63]  Prashant J. Shenoy,et al.  Cello: A Disk Scheduling Framework for Next Generation Operating Systems* , 1998, SIGMETRICS '98/PERFORMANCE '98.

[64]  Walter A. Burkhard,et al.  Disk array storage system reliability , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[65]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[66]  George Varghese,et al.  Leap forward virtual clock: a new fair queuing scheme with guaranteed delays and throughput fairness , 1997, PODC '97.

[67]  Banu Özden,et al.  Fair queuing for aggregated multiple links , 2001, SIGCOMM.

[68]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[69]  Randall R. Stewart,et al.  Stream Control Transmission Protocol , 2000, RFC.

[70]  Banu Ozden,et al.  Fair queuing for aggregated multiple links , 2001, SIGCOMM 2001.

[71]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[72]  José Carlos Brustoloni,et al.  Effects of buffering semantics on I/O performance , 1996, OSDI '96.

[73]  Yale N. Patt,et al.  On-line extraction of SCSI disk drive parameters , 1995, SIGMETRICS '95/PERFORMANCE '95.

[74]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[75]  David A. Patterson,et al.  Towards Availability Benchmarks: A Case Study of Software RAID Systems , 2000, USENIX Annual Technical Conference, General Track.

[76]  Scott Shenker,et al.  Comments on the performance of measurement-based admission control algorithms , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[77]  Arif Merchant,et al.  Façade: Virtual Storage Devices with Performance Guarantees , 2003, FAST.

[78]  Yale N. Patt,et al.  Scheduling algorithms for modern disk drives , 1994, SIGMETRICS 1994.

[79]  Peter F. Corbett,et al.  The Direct Access File System , 2003, FAST.

[80]  Prashant J. Shenoy,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[81]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.