Diagnostics for causes of packet loss in a high performance data transfer system

Summary form only given. As computational grids become an increasingly dominant force in the high-performance computing arena, the problem of efficiently transferring very large data sets, across geographically distributed computing resources, becomes increasingly difficult and important. Current approaches view the problem largely, if not exclusively, as a network-level problem. Thus all packet loss is interpreted and treated as a network congestion event, limiting the ability to detect or react to changes in the end-to-end system. We believe that a new approach to this problem is worth pursuing, and we are investigating techniques that can differentiate between data loss caused by contention in the network and loss caused by contention for shared CPU resources at the communication endpoints. The approach is to collect and analyze what we term packet-loss signatures that describe the patterns of packet-loss in the current transmission window. We analyze these signatures using Fourier analysis and symbolic dynamics, and present a simple set of experiments demonstrating the effectiveness of this approach.

[1]  Robert L. Grossman,et al.  Simple Available Bandwidth Utilization Library for High-Speed Wide Area Networks , 2005, The Journal of Supercomputing.

[2]  Mark Allman,et al.  An Application-Level solution to TCP''s Satellite Inefficiencies , 1996 .

[3]  Phillip M. Dickens FOBS: A Lightweight Communication Protocol for Grid Computing , 2003, Euro-Par.

[4]  Anastasios A. Tsonis,et al.  Complexity and Predictability of Hourly Precipitation , 1993 .

[5]  Sally Floyd,et al.  TCP Selective Acknowledgement Options , 1996 .

[6]  B. Hao,et al.  Elementary Symbolic Dynamics And Chaos In Dissipative Systems , 1989 .

[7]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[8]  Wu-chun Feng,et al.  The Failure of TCP in High-Performance Computational Grids , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[9]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[10]  William Gropp,et al.  An evaluation of object-based data transfers on high performance networks , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[11]  Srinivasan Seshan,et al.  Improving TCP/IP performance over wireless networks , 1995, MobiCom '95.

[12]  Sally Floyd,et al.  2 What ’ s the Problem ? 2 . 1 Basics TCP uses the following algorithm to adjust its congestion window , 2002 .

[13]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[14]  Politi,et al.  Hierarchical approach to complexity with applications to dynamical systems. , 1990, Physical review letters.

[15]  Ibrahim Matta,et al.  End-to-End Inference of Loss Nature in a Hybrid Wired/Wireless Environment , 2002 .

[16]  D. Lieberman,et al.  Fourier analysis , 2004, Journal of cataract and refractive surgery.