Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance

Abstract SWITCH (Software Workbench for Interactive, Time Critical and Highly self-adaptive cloud applications) allows for the development and deployment of real-time applications in the cloud, but it does not yet support instances backed by Graphics Processing Units (GPUs). Wanting to explore how SWITCH might support CUDA (a GPU architecture) in the future, we have undertaken a review of time-critical CUDA applications, discovering that run-time requirements (which we call ‘wall time’) are in many cases regarded as the most important. We have performed experiments to investigate which parameters have the greatest impact on wall time when running multiple Amazon Web Services GPU-backed instances. Although a maximum of 8 single-GPU instances can be launched in a single Amazon Region, launching just 2 instances rather than 1 gives a 42% decrease in wall time. Also, instances are often wasted doing nothing, and there is a moderately-strong relationship between how problems are distributed across instances and wall time. These findings can be used to enhance the SWITCH provision for specifying Non-Functional Requirements (NFRs); in the future, GPU-backed instances could be supported. These findings can also be used more generally, to optimise the balance between the computational resources needed and the resulting wall time to obtain results.

[1]  Lars Moland Eliassen,et al.  A Comparison of Learning Based Background Subtraction Techniques Implemented in CUDA , 2009 .

[2]  Markus Kowarschik,et al.  GPU-accelerated SART reconstruction using the CUDA programming environment , 2009, Medical Imaging.

[3]  Yajun Ha,et al.  Correlation ratio based volume image registration on GPUs , 2015, Microprocess. Microsystems.

[4]  Xin Yuan,et al.  A comparative study of high-performance computing on the cloud , 2013, HPDC.

[5]  Yigang Sun,et al.  Modern GPU-Based Forward-Projection Algorithm with a New Sampling Method , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.

[6]  Andrew Jones,et al.  Quality of Service Models for Microservices and Their Integration into the SWITCH IDE , 2017, 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W).

[7]  Miles Weston,et al.  Full matrix capture with time-efficient auto-focusing of unknown geometry through dual-layered media , 2013 .

[8]  G. Bruce Berriman,et al.  The Application of Cloud Computing to Astronomy: A Study of Cost and Performance , 2010, 2010 Sixth IEEE International Conference on e-Science Workshops.

[9]  Andrew Jones,et al.  Towards a methodology for creating time-critical, cloud-based CUDA applications , 2018 .

[10]  Wu-chun Feng,et al.  GPU-Based Iterative Medical CT Image Reconstructions , 2018, Journal of Signal Processing Systems.

[11]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[12]  ProdanRadu,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011 .

[13]  Parth Gohil,et al.  A performance analysis of MapReduce applications on big data in cloud based Hadoop , 2014, International Conference on Information Communication and Embedded Systems (ICICES2014).

[14]  C. Davis,et al.  Method to derive ocean absorption coefficients from remote-sensing reflectance. , 1996, Applied optics.

[15]  Antonio J. Plaza,et al.  GPU implementation of hyperspectral image classification based on weighted Markov random fields , 2016, 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS).

[16]  Pavel Zemcík,et al.  Real-time object detection on CUDA , 2010, Journal of Real-Time Image Processing.

[17]  David Romero-Laorden,et al.  Analysis of Parallel Computing Strategies to Accelerate Ultrasound Imaging Processes , 2016, IEEE Transactions on Parallel and Distributed Systems.

[18]  Bo Jiang,et al.  Novel multi-scale retinex with color restoration on graphics processing unit , 2014, Journal of Real-Time Image Processing.

[19]  Dmitri Riabkov,et al.  Accelerated cone-beam backprojection using GPU-CPU hardware , 2022 .

[20]  Amit A. Kale,et al.  Towards a robust, real-time face processing system using CUDA-enabled GPUs , 2009, 2009 International Conference on High Performance Computing (HiPC).

[21]  Marwa Chouchene,et al.  Optimized parallel implementation of face detection based on GPU component , 2015, Microprocess. Microsystems.

[22]  Matthew England,et al.  cvTile: Multilevel parallel geospatial data processing with OpenCV and CUDA , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[23]  Louise Knight,et al.  Co-evolving protein sites: their identification using novel, highly-parallel algorithms, and their use in classifying hazardous genetic mutations , 2017 .

[24]  Antonio J. Plaza,et al.  Real-time implementation of remotely sensed hyperspectral image unmixing on GPUs , 2012, Journal of Real-Time Image Processing.

[25]  Pheng-Ann Heng,et al.  Accelerating simultaneous algebraic reconstruction technique with motion compensation using CUDA-enabled GPU , 2010, International Journal of Computer Assisted Radiology and Surgery.

[26]  A. Valencia,et al.  Improving contact predictions by the combination of correlated mutations and other sources of sequence information. , 1997, Folding & design.

[27]  Zoltan Juhasz Highly parallel online bioelectrical signal processing on GPU architecture , 2017, 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[28]  Zaid Al-Ars,et al.  GPU-based stochastic-gradient optimization for non-rigid medical image registration in time-critical applications , 2018, Medical Imaging.

[29]  Surya S. Durbha,et al.  High performance SIFT feature classification of VHR satellite imagery for disaster management , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[30]  Benjamin Keck,et al.  Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture , 2013 .

[31]  Rupak Biswas,et al.  Performance evaluation of Amazon Elastic Compute Cloud for NASA high‐performance computing applications , 2016, Concurr. Comput. Pract. Exp..

[32]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[33]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[34]  Meng Zhang,et al.  Acceleration algorithm for CUDA-based face detection , 2013, 2013 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2013).

[35]  David R. Kaeli,et al.  Accelerating an Imaging Spectroscopy Algorithm for Submerged Marine Environments Using Graphics Processing Units , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[36]  Jun Li,et al.  Real-Time Implementation of the Sparse Multinomial Logistic Regression for Hyperspectral Image Classification on GPUs , 2015, IEEE Geoscience and Remote Sensing Letters.

[37]  Min Li,et al.  GPU-accelerated block matching algorithm for deformable registration of lung CT images , 2015, 2015 IEEE International Conference on Progress in Informatics and Computing (PIC).

[38]  Cees T. A. M. de Laat,et al.  Planning virtual infrastructures for time critical applications with multiple deadline constraints , 2017, Future Gener. Comput. Syst..

[39]  Ewa Deelman,et al.  Experiences using cloud computing for a scientific workflow application , 2011, ScienceCloud '11.

[40]  C. Mobley,et al.  Hyperspectral remote sensing for shallow waters. I. A semianalytical model. , 1998, Applied optics.

[41]  Sébastien Ourselin,et al.  Fast free-form deformation using graphics processing units , 2010, Comput. Methods Programs Biomed..

[42]  Fumihiko Ino,et al.  Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA , 2014, IEEE Journal of Biomedical and Health Informatics.

[43]  John D. Owens,et al.  Fast Deformable Registration on the GPU: A CUDA Implementation of Demons , 2008, 2008 International Conference on Computational Sciences and Its Applications.

[44]  Fan Wu,et al.  Optimization of parallel algorithm for Kalman filter on CPU-GPU heterogeneous system , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[45]  Ulrich Brunsmann,et al.  Gpu architecture for stationary multisensor pedestrian detection at smart intersections , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[46]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[47]  Steve B. Jiang,et al.  Ultra-Fast Digital Tomosynthesis Reconstruction Using General-Purpose GPU Programming for Image-Guided Radiation Therapy , 2011, Technology in cancer research & treatment.

[48]  Wing-kin Tam,et al.  Neural Parallel Engine: A toolbox for massively parallel neural signal processing , 2018, Journal of Neuroscience Methods.

[49]  Ian Taylor,et al.  SWITCH workbench: A novel approach for the development and deployment of time-critical microservice-based cloud-native applications , 2019, Future Gener. Comput. Syst..

[50]  Antonio Plaza,et al.  Graphics processing unit implementation of JPEG2000 for hyperspectral image compression , 2012 .

[51]  Marco Mellia,et al.  Exploring the cloud from passive measurements: The Amazon AWS case , 2013, 2013 Proceedings IEEE INFOCOM.

[52]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[53]  Surya S. Durbha,et al.  High resolution disaster data clustering using Graphics Processing Units , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[54]  Jos Vander Sloten,et al.  Analyzing the potential of GPGPUs for real-time explicit finite element analysis of soft tissue deformation using CUDA , 2015 .

[55]  Michael Goesele,et al.  Information-theoretic analysis of molecular (co)evolution using graphics processing units , 2012, ECMLS '12.

[56]  Stefan Tai,et al.  What Are You Paying For? Performance Benchmarking for Infrastructure-as-a-Service Offerings , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[57]  C. Mobley,et al.  Hyperspectral remote sensing for shallow waters. 2. Deriving bottom depths and water properties by optimization. , 1999, Applied optics.

[58]  Xiyang Zhi,et al.  Realization of CUDA-based real-time registration and target localization for high-resolution video images , 2016, Journal of Real-Time Image Processing.

[59]  Yiannis S. Boutalis,et al.  Color and Edge Directivity Descriptor on GPGPU , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[60]  Yang-Lang Chang,et al.  Accelerating the Kalman Filter on a GPU , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[61]  Tao Yang,et al.  GPU based iterative cone-beam CT reconstruction using empty space skipping technique. , 2013, Journal of X-ray science and technology.

[62]  Justin C. Williams,et al.  Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain–Computer Interface Feature Extraction , 2009, Front. Neuroeng..

[63]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[64]  V Strbac,et al.  GPGPU-based explicit finite element computations for applications in biomechanics: the performance of material models, element technologies, and hardware generations , 2017, Computer methods in biomechanics and biomedical engineering.