QoS Guarantee in Partial Failure of Clustered VOD Server

For large scale VOD service, cluster servers are spotlighted to their high performance and low cost. A cluster server usually consists of a front-end node and multiple back-end nodes. Though increasing the number of back-end nodes can result in the more QoS streams for clients, the possibility of failures in back-end nodes is proportionally increased. The failure causes not only the stop of all streaming service but also the loss of the current playing positions. In this paper, when a back-end node becomes a failed state, the recovery mechanisms are studied to support the unceasing streaming service. For the actual VOD service environment, we implement a cluster-based VOD servers composed of general PCs and adopt the parallel processing for MPEG movies. From the implemented VOD server, a video block recovery mechanism is designed on parity algorithms. However, without considering the architecture of cluster-based VOD server, the application of the basic technique causes the performance bottleneck of the internal network for recovery and also results in the inefficiency CPU usage of back-end nodes. To address these problems, we propose a new failure recovery mechanism based on the pipeline computing concept.

[1]  Dong Tang,et al.  Automatic generation of availability models in RAScad , 2002, Proceedings International Conference on Dependable Systems and Networks.

[2]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[3]  A GibsonGarth,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994 .

[4]  J. Duane Northcutt,et al.  The interactive performance of SLIM: a stateless, thin-client architecture , 1999, SOSP.

[5]  Ki-Dong Chung,et al.  A multicast delivery scheme for VCR operations in a large VOD system , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[6]  Daniel P. Siewiorek,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.

[7]  Heon Young Yeom,et al.  Modeling the Caching Effect in Continuous Media Servers , 2004, Multimedia Tools and Applications.

[8]  Prashant J. Shenoy,et al.  Failure recovery algorithms for multimedia servers , 2000, Multimedia Systems.

[9]  Chita R. Das,et al.  Caching and scheduling in NAD-based multimedia servers , 2004, IEEE Transactions on Parallel and Distributed Systems.

[10]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[11]  William J. Bolosky,et al.  Distributed schedule management in the Tiger video fileserver , 1997, SOSP.

[12]  Jack Y. B. Lee Supporting server-level fault tolerance in concurrent-push-based parallel video servers , 2001, IEEE Trans. Circuits Syst. Video Technol..

[13]  David H. C. Du,et al.  The Designs of RAID with XOR Engines on Disks for Mass Storage Systems , 1998 .

[14]  S LamMonica,et al.  The interactive performance of SLIM , 1999 .

[15]  Asit Dan,et al.  Multimedia servers: applications, environments, and design , 2000 .

[16]  David A. Patterson,et al.  Guest Editors' Introduction: Approaches to Recovery-Oriented Computing , 2005, IEEE Internet Comput..

[17]  Ernst W. Biersack,et al.  Data striping and reliability aspects in distributed video servers , 2004, Cluster Computing.

[18]  Wu-chi Feng,et al.  Critical bandwidth allocation techniques for stored video delivery across best-effort networks , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[19]  Ernst W. Biersack,et al.  Modeling and Performance Comparison of Reliability Strategies for Distributed Video Servers , 2000, IEEE Trans. Parallel Distributed Syst..

[20]  Sang Ho Lee,et al.  Dynamic buffer allocation in video-on-demand systems , 2001, SIGMOD '01.