Inferring the Topology and Traffic Load of Parallel Programs Running in a Virtual Machine Environment

We are developing a distributed computing environment based on virtual machines featuring application monitoring, network monitoring, and an adaptive virtual network. In this paper, we describe our initial results in monitoring the communication traffic of parallel applications, and inferring its spatial communication properties. The ultimate goal is to be able to exploit such knowledge to maximize the parallel efficiency of the running parallel application by using VM migration, virtual overlay network configuration and network reservation techniques, which are a part of the distributed computing environment. Specifically, we demonstrate that: (1) we can monitor the parallel application network traffic in our layer 2 virtual network system with very low overhead, (2) we can aggregate the monitoring information captured on each host machine to form a global picture of the parallel application's traffic load matrix, and (3) we can infer from the traffic load matrix the application topology. In earlier work, we have demonstrated that we can capture the time dynamics of the applications. We begin here by considering offline traffic monitoring and inference as a proof of concept, testing it with a variety of synthetic and actual workloads. Next, we describe the design and implementation of our online system, the Virtual Topology and Traffic Inference Framework (VTTIF), and evaluate it using a NAS benchmark.

[1]  Bruce Lowekamp,et al.  ECO: Efficient Collective Operations for communication on heterogeneous networks , 1996, Proceedings of International Conference on Parallel Processing.

[2]  Peter A. Dinda,et al.  The measured network traffic of compiler-parallelized programs , 2001, International Conference on Parallel Processing, 2001..

[3]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1994, J. Parallel Distributed Comput..

[4]  Xiaoyun Zhu,et al.  Grids for Enterprise Applications , 2003, JSSPP.

[5]  Yin Zhang,et al.  On the constancy of internet path properties , 2001, IMW '01.

[6]  Stefan Savage,et al.  The end-to-end effects of Internet path selection , 1999, SIGCOMM '99.

[7]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[8]  Vaidy S. Sunderam,et al.  Performance of the NAS Parallel Benchmarks on PVM-Based Networks , 1995, J. Parallel Distributed Comput..

[9]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[10]  Sajal K. Das,et al.  Book Review: Introduction to Parallel Algorithms and Architectures : Arrays, Trees, Hypercubes by F. T. Leighton (Morgan Kauffman Pub, 1992) , 1992, SIGA.

[11]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[12]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1992, J. Parallel Distributed Comput..

[13]  Peter A. Dinda,et al.  Towards Virtual Networks for Virtual Machine Grid Computing , 2004, Virtual Machine Research and Technology Symposium.