Accurate Classification for HPC Applications Concerning Traffic Matrix Visual Patterns

The evolution of computing and networking allowed multiple computers to be interconnected, aggregating their processing powers to form High-Performance Computing (HPC) architectures. Applications running in these computational environments process and communicate huge amounts of information, taking several hours or even days to complete their executions so, understanding their computation and communication demands is essential for management purposes. Moreover, although most of HPC applications are implemented with well-known algorithms that tend to follow a given pattern in computation and communication, the classical methods of traffic analysis have not been accurate to classify them. In this sense, we argue that observing and understanding the visual patterns in these applications' traffic matrices (TMs) can provide an accurate classification method. In this paper, we propose TReco, a framework that maintains a database with visual features extracted from these TMs and applies machine learning techniques to classify the HPC applications that are consuming the network, regardless of the number of computational nodes executing it. In our experiments, we reached accuracy rate over 99.75%.

[1]  Deepak Kulhare,et al.  Image Processing (IP) Through Erosion and Dilation Methods , 2013 .

[2]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Abhishek Gupta,et al.  Evaluation of HPC Applications on Cloud , 2011, 2011 Sixth Open Cirrus Summit.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Luiz Eduardo Soares de Oliveira,et al.  Exploring Textures in Traffic Matrices to Classify Data Center Communications , 2018, 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA).

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Rolf Stadler,et al.  Resource Management in Clouds: Survey and Research Challenges , 2015, Journal of Network and Systems Management.

[8]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[9]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[10]  Prabhat Kumar,et al.  Internet Traffic Classification: A Survey , 2016 .

[11]  Yang Zhao,et al.  Completed robust local binary pattern for texture classification , 2013, Neurocomputing.