A Passive Solution to the Memory Resource Discovery Problem in Computational Clusters

Resource discovery is an important problem in distributed computing, because the throughput of the system is directly linked to its ability to quickly locate available resources. Current solutions are undesirable for discovering resources in large computational clusters because they are intrusive, chatty (i.e., have per-node overhead), or maintenance-intensive. In this paper, we present a novel method that offers the ability to non-intrusively identify resources that have available memory; this is critical for memory-intensive cluster applications such as weather forecasting and computational chemistry. The prime benefits are fourfold: (1) low message complexity, (2) scalability, (3) load balancing, and (4) low maintainability. We demonstrate the feasibility of our method with experiments using a 50-node test-bed (DETERlab). Our technique allows us to establish a correlation between memory load and the timely response of network traffic from a node. Results show that our method can accurately (92%-100%) identify nodes with available memory through analysis of existing network traffic, including network traffic that has passed through a switch (non-congested).

[1]  Dongho Kim,et al.  Design, Deployment, and Use of the DETER Testbed , 2007, DETER.

[2]  S. Chaisiri,et al.  Survey of Resource Discovery in Grid Environments , 2004 .

[3]  Zhao Zhang,et al.  Towards Loo on , 2008 .

[4]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[5]  Daniel A. Reed,et al.  Grids, the TeraGrid, and Beyond , 2003, Computer.

[6]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[7]  Mary Baker,et al.  Nettimer: A Tool for Measuring Bottleneck Link Bandwidth , 2001, USITS.

[8]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[9]  Raheem A. Beyah,et al.  A Passive Solution to the CPU Resource Discovery Problem in Cluster Grid Networks , 2011, IEEE Transactions on Parallel and Distributed Systems.

[10]  S. Makineni,et al.  Performance characterization of TCP/IP packet processing in commercial server workloads , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[11]  Wei Sun,et al.  CPU Load Predictions on the Computational Grid , 2006, CCGRID.

[12]  Masha Sosonkina,et al.  Packet probing as network load detection for scientific applications at run-time , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  Manish Jain,et al.  End-to-end available bandwidth: measurement methodology, dynamics, and relation with TCP throughput , 2003, TNET.

[14]  Luca Deri Passively Monitoring Networks at Gigabit Speeds Using Commodity Hardware and Open Source Software , 2003 .

[15]  Zhao Zhang,et al.  Toward loosely coupled programming on petascale systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Georg Carle,et al.  Traffic Anomaly Detection Using K-Means Clustering , 2007 .

[17]  Ron White,et al.  how computers work , 1964 .

[18]  Muthucumaru Maheswaran,et al.  A Parameter-Based Approach to Resource Discovery in Grid Computing System , 2000, GRID.

[19]  P. Sadayappan,et al.  Assessment and enhancement of meta-schedulers for multi-site job sharing , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[20]  Aaron Striegel,et al.  CLog: Low Cost Gigabit Full Packet Logging , 2006, J. Commun..

[21]  Michele C. Weigle,et al.  Delay-based early congestion detection and adaptation in TCP: impact on web performance , 2005, Comput. Commun..

[22]  J.A. Copeland,et al.  Using Active Scanning to Identify Wireless NICs , 2006, 2006 IEEE Information Assurance Workshop.

[23]  W. Richard Stevens,et al.  Unix network programming , 1990, CCRV.

[24]  Raheem A. Beyah,et al.  Using network traffic to passively detect under utilized resources in high performance cluster grid computing environments , 2007, GridNets '07.

[25]  Vern Paxson,et al.  On estimating end-to-end network path properties , 2001, SIGCOMM LA '01.

[26]  Raheem A. Beyah,et al.  A Passive Approach to Rogue Access Point Detection , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[27]  C. Corbett,et al.  Passive Identification of Under-Utilized CPUs in High Performance Cluster Grid Networks , 2008, 2008 IEEE International Conference on Communications.

[28]  Richard Repasky,et al.  Survey of TeraGrid Job Distribution: Toward Specialized Serial Machines as TeraGrid Resources , 2007 .

[29]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[30]  Mor Harchol-Balter,et al.  Resource discovery in distributed networks , 1999, PODC '99.

[31]  Ian T. Foster,et al.  Homeostatic and tendency-based CPU load predictions , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[32]  Scott Rixner,et al.  Exploiting task-level concurrency in a programmable network interface , 2003, PPoPP '03.

[33]  Raheem A. Beyah,et al.  Passive Classification of Wireless NICs during Rate Switching , 2008, EURASIP J. Wirel. Commun. Netw..

[34]  Richard Wolski,et al.  Predicting the CPU availability of time‐shared Unix systems on the computational grid , 2004, Cluster Computing.

[35]  K. Siu,et al.  XXXXXXXXXX An O ( log n ) Randomized Resource Discovery Algorithm , 2000 .

[36]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[37]  Vassilios V. Dimakopoulos,et al.  On the Performance of Flooding-Based Resource Discovery , 2006, IEEE Transactions on Parallel and Distributed Systems.