Providing resiliency for optical grids by exploiting relocation: A dimensioning study based on ILP

Grids use a form of distributed computing to tackle complex computational and data processing problems scientists are presented with today. When designing an (optical) network supporting grids, it is essential that it can overcome single network failures, for which several protection schemes have been devised in the past. In this work, we extend the existing Shared Path protection scheme by incorporating the anycast principle typical of grids: a user typically does not care on what specific server this job gets executed and is merely interested in its timely delivery of results. Therefore, in contrast with Classical Shared Path protection (CSP), we will not necessarily provide a backup path between the source and the original destination. Instead, we allow to relocate the job to another server location if we can thus provide a backup path which comprises less wavelengths than the one CSP would suggest. We assess the bandwidth savings enabled by relocation in a quantitative dimensioning case study on an European and an American network topology, exhibiting substantial savings of the number of required wavelengths (in the order of 11-50%, depending on network topology and server locations). We also investigate how relocation affects the computational load on the execution servers. The case study is based on solving a grid network dimensioning problem: we present Integer Linear Programming (ILP) formulations for both the traditional CSP and the new resilience scheme exploiting relocation (SPR). We also outline a strategy to deal with the anycast principle: assuming we are given just the origins and intensity of job arrivals, we derive a static (source,destination)-based demand matrix. The latter is then used as input to solve the network dimensioning ILP for an optical circuit-switched WDM network.

[1]  Chris Develder,et al.  Mean Field Calculation for Optical Grid Dimensioning , 2010, IEEE/OSA Journal of Optical Communications and Networking.

[2]  Biswanath Mukherjee,et al.  Path-protection routing and wavelength assignment (RWA) in WDM mesh networks under duct-layer constraints , 2003, TNET.

[3]  Filip De Turck,et al.  Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids , 2009, IEEE Transactions on Parallel and Distributed Systems.

[4]  Robert Cole,et al.  Computer Communications , 1982, Springer New York.

[5]  Piet Demeester,et al.  Design of the Optical Path Layer in Multiwavelength Cross-Connected Networks , 1996, IEEE J. Sel. Areas Commun..

[6]  Krishna M. Sivalingam,et al.  A hybrid protection-restoration mechanism for enhancing dual-failure restorability in optical mesh-restorable networks , 2003, OptiComm: Optical Networking and Communications Conference.

[7]  Kang G. Shin,et al.  Replication and allocation of task modules in distributed real-time systems , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[8]  Piet Demeester,et al.  Design and control of optical grid networks , 2007, 2007 Fourth International Conference on Broadband Communications, Networks and Systems (BROADNETS '07).

[9]  Yaohang Li,et al.  Improving performance via computational replication on a large-scale computational grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[10]  George Varghese,et al.  Optical Network Survivability , 2008 .

[11]  Mario Martinelli,et al.  Optical Network Survivability: Protection Techniques in the WDM Layer , 2002, Photonic Network Communications.

[12]  Didier Colle,et al.  Data-centric optical networks and their survivability , 2002, IEEE J. Sel. Areas Commun..

[13]  R. Biswas,et al.  Large-scale distributed computational fluid dynamics on the information power grid using Globus , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[14]  M.J. O'Mahony,et al.  Dynamic optical-network architectures and technologies for existing and emerging grid services , 2005, Journal of Lightwave Technology.

[15]  Francisco Vilar Brasileiro,et al.  Faults in grids: why are they so bad and what can be done about it? , 2003, Proceedings. First Latin American Web Congress.

[16]  Chris Develder,et al.  Exploiting relocation to reduce network dimensions of resilient optical grids , 2009, 2009 7th International Workshop on Design of Reliable Communication Networks.

[17]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[18]  Carl Kesselman,et al.  High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[19]  Chris Develder,et al.  Job Demand Models for Optical Grid Research , 2007, ONDM.

[20]  Biswanath Mukherjee,et al.  On dimensioning optical grids and the impact of scheduling , 2008, Photonic Network Communications.

[21]  G. Maier,et al.  WDM Network Design by ILP Models Based on Flow Aggregation , 2007, IEEE/ACM Transactions on Networking.