1+1 Protection of Overlay Distributed Computing Systems: Modeling and Optimization

The development of the Internet and growing amount of data produced in various systems have triggered the need to construct distributed computing systems required to process the data. Since in some cases, results of computations are of great importance, (e.g., analysis of medical data, weather forecast, etc.), survivability of computing systems, i.e., capability to provide continuous service after failures of network elements, becomes a significant issue. Most of previous works in the field of survivable computing systems consider a case when a special dedicated optical network is used to connect computing sites. The main novelty of this work is that we focus on overlay-based distributed computing systems, i.e., in which the computing system works as an overlay on top of an underlying network, e.g., Internet. In particular, we present a novel protection scheme for such systems. The main idea of the proposed protection approach is based on 1+1 protection method developed in the context of connection-oriented networks. A new ILP model for joint optimization of task allocation and link capacity assignment in survivable overlay distributed computing systems is introduced. The objective is to minimize the operational (OPEX) cost of the system including processing costs and network capacity costs. Moreover, two heuristic algorithms are proposed and evaluated. The results show that provisioning protection to all tasks increases the OPEX cost by 110% and 106% for 30-node and 200-node systems, respectively, compared to the case when tasks are not protected.

[1]  Computer Network Security , 2005 .

[2]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[3]  Franco Travostino,et al.  Grid networks : enabling grids with advanced communication technology , 2006 .

[4]  Lemin Li,et al.  Insights for Segment Protection in Survivable WDM Mesh Networks with SRLG Constraints , 2006, GLOBECOM.

[5]  Michal Pioro,et al.  CHAPTER 8 – Fair Networks , 2004 .

[6]  Hamid R. Rabiee,et al.  An optimal discrete rate allocation for overlay video multicasting , 2008, Comput. Commun..

[7]  Wayne D. Grover,et al.  Mesh-based Survivable Transport Networks: Options and Strategies for Optical, MPLS, SONET and ATM Networking , 2003 .

[8]  Michal Wozniak,et al.  Optimization of overlay distributed computing systems for multiple classifier system - heuristic approach , 2012, Log. J. IGPL.

[9]  Brigitte Jaumard,et al.  Maximizing access to IT services on resilient optical grids , 2011, 2011 3rd International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT).

[10]  Jacek Rak,et al.  Region Protection/Restoration Scheme in Survivable Networks , 2005, MMM-ACNS.

[11]  Lei Guo,et al.  A Novel Survivable Routing Algorithm With Partial Shared-Risk Link Groups (SRLG)-Disjoint Protection Based on Differentiated Reliability Constraints in WDM Optical Mesh Networks , 2007, Journal of Lightwave Technology.

[12]  Xuemin Shen,et al.  Handbook of Peer-to-Peer Networking , 2009 .

[13]  Jarek Nabrzyski,et al.  Grid Resource Management , 2004 .

[14]  J. Rak,et al.  Fast Service Recovery Under Shared Protection in WDM Networks , 2012, Journal of Lightwave Technology.

[15]  Biswanath Mukherjee,et al.  Survivable WDM mesh networks , 2003 .

[16]  Tibor Cinkler,et al.  A New Shared Segment Protection Method for Survivable Networks with Guaranteed Recovery Time , 2008, IEEE Transactions on Reliability.

[17]  Chris Develder,et al.  Survivable Optical Grid Dimensioning: Anycast Routing with Server and Network Failure Protection , 2011, 2011 IEEE International Conference on Communications (ICC).

[18]  Deep Medhi,et al.  Routing, flow, and capacity design in communication and computer networks , 2004 .

[19]  Ying Zhu,et al.  Overlay Networks with Linear Capacity Constraints , 2008, IEEE Trans. Parallel Distributed Syst..

[20]  Lemin Li,et al.  Insights for segment protection in survivable WDM mesh networks with SRLG constraints , 2006, IEEE Globecom 2006.

[21]  Bruno Volckaert,et al.  Scalable dimensioning of resilient Lambda Grids , 2008, Future Gener. Comput. Syst..

[22]  Barry Wilkinson Grid Computing: Techniques and Applications , 2009 .

[23]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[24]  Lei Song,et al.  Accumulated-Downtime-Oriented Restoration Strategy With Service Differentiation in Survivable WDM Mesh Networks , 2009, IEEE/OSA Journal of Optical Communications and Networking.

[25]  Chris Develder,et al.  Providing resiliency for optical grids by exploiting relocation: A dimensioning study based on ILP , 2011, Comput. Commun..

[26]  Jean-Philippe Martin-Flatin,et al.  Self-Managed Networks, Systems, and Services , 2006, Lecture Notes in Computer Science.

[27]  Jacek Rak Capacity Efficient Shared Protection and Fast Restoration Scheme in Self-Configured Optical Networks , 2006, SelfMan.

[28]  Biswanath Mukherjee,et al.  Survivable WDM mesh networks. Part I-Protection , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[29]  D. Milojicic,et al.  Peer-to-Peer Computing , 2010 .