Supporting server-level fault tolerance in concurrent-push-based parallel video servers

Parallel video servers have been proposed for building large-scale video-on-demand (VoD) systems from multiple low-cost servers. However, when adding more servers to scale up the capacity, system-level reliability will decrease as failure of any one of the servers will cripple the entire system. To tackle this reliability problem, this paper proposes and analyzes architectures to support server-level fault tolerance in parallel video servers. Based on the concurrent push architecture proposed earlier, this paper tackles three problems pertaining to fault tolerance, namely redundancy management, redundant data transmission protocol, and real-time fault masking. First, redundant data based on erasure codes are introduced to video data stored in the servers, which are then delivered to the clients to support fault tolerance. Despite the success of distributed redundancy striping schemes such as RAID-5 in disk array implementations, we discover that similar schemes extended to the server context do not scale well. Instead, we propose a redundant server scheme that is both scalable, and with lower total server buffer requirement. Second, two protocols are proposed to manage the transmission of redundant data to the clients, namely forward erasure correction which always transmits redundant data, and on-demand correction which transmits redundant data only after a server failure is detected. Third, to enable ongoing video sessions to maintain nonstop video playback during failure, we propose using fault masking at the client to recompute lost video data in real-time. In particular we derive the amount of client buffer required so that nonstop, continuous video playback can be maintained despite server failures.

[1]  Herb Taylor,et al.  The Magic video-on-demand server and real-time simulation system , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[2]  Jack Y. B. Lee Parallel Video Servers: A Tutorial , 1998, IEEE Multim..

[3]  Ernst W. Biersack,et al.  Intra- and inter-stream synchronisation for stored multimedia streams , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[4]  Doug Shepherd,et al.  Scalable storage servers for digital audio and video , 1994 .

[5]  David J. DeWitt,et al.  The SPIFFI scalable video-on-demand system , 1995, SIGMOD '95.

[6]  S. Wicker Error Control Systems for Digital Communication and Storage , 1994 .

[7]  Jack Y. B. Lee,et al.  A server array approach for video-on-demand service on local area networks , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[8]  Charles E Ebeling,et al.  An Introduction to Reliability and Maintainability Engineering , 1996 .

[9]  A. L. Narasimha Reddy Scheduling and data distribution in a multiprocessor video server , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[10]  Guru M. Parulkar,et al.  Efficient data layout, scheduling and playout control in MARS , 1997, Multimedia Systems.

[11]  Ernst W. Biersack,et al.  The server array: a scalable video server architecture , 1995, High-Speed Networking for Multimedia Applications.

[12]  Ron Buck The Oracle media server for nCUBE massively parallel systems , 1994, Proceedings of 8th International Parallel Processing Symposium.

[13]  Michael B. Jones,et al.  The Tiger Video Fileserver , 1996 .

[14]  Richard R. Muntz,et al.  Design of Fault-Tolerant Large-Scale VOD Servers: With Emphasis on High-Performance and Low-Cost , 2001, IEEE Trans. Parallel Distributed Syst..

[15]  Min-You Wu,et al.  Scheduling for large-scale parallel video servers , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[16]  Jack Y. B. Lee,et al.  Performance Analysis of a Pull-Based Parallel Video Server , 2000, IEEE Trans. Parallel Distributed Syst..

[17]  Harrick M. Vin,et al.  High availability in clustered multimedia servers , 1996, ICDE 1996.

[18]  Richard R. Muntz,et al.  Fault tolerant design of multimedia servers , 1995, SIGMOD '95.

[19]  Harrick M. Vin,et al.  Efficient failure recovery in multidisk multimedia servers , 1996, Other Conferences.

[20]  Jack Y. B. Lee Concurrent push-A scheduling algorithm for push-based parallel video servers , 1999, IEEE Trans. Circuits Syst. Video Technol..

[21]  Randy H. Katz,et al.  Introduction to redundant arrays of inexpensive disks (RAID) , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[22]  Jack Y. B. Lee,et al.  Redundant array of inexpensive servers (RAIS) for on-demand multimedia services , 1997, Proceedings of ICC'97 - International Conference on Communications.

[23]  Fouad A. Tobagi,et al.  Streaming RAID: a disk array management system for video files , 1993, MULTIMEDIA '93.

[24]  Philip S. Yu,et al.  Design and modeling of clustered RAID , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[25]  Prashant J. Shenoy,et al.  Efficient failure recovery in multi-disk multimedia servers , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[26]  Harrick M. Vin,et al.  Comparative Evaluation of Server-push and Client-pull Architectures for Multimedia Servers , 1996 .