Performance Prediction for Multi-Application Concurrency on GPUs

With the advent of edge computing and 5G, multiple mobile applications are being offloaded to cloud servers to meet their computational demands. Computer vision workloads dominate this space. Since the vision workloads are composed of linear algebra kernels, they perform significantly well on SIMT/SIMD architectures such as GPUs. While an application can maximize its performance on a GPU when it is the sole consumer of the GPU's resources, it fails to maintain that performance in a multi-application scenario. The primary cause of this problem is the lack of efficient virtualization techniques for GPUs and contention among the applications for the shared resources. Sadly, most of the prior work in this area is devoted to predicting single application performance. To the best of our knowledge we propose the first machine learning based predictor to predict the performance of an ensemble of applications on a GPU. Our predictor achieves an error of 9% across a suite of representative vision workloads for predicting the execution time. Competing algorithms that primarily work for single application scenarios have significantly inferior prediction accuracy and their error rates are more than 140%.

[1]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[2]  Onur Mutlu,et al.  Utility-based acceleration of multithreaded applications on asymmetric CMPs , 2013, ISCA.

[3]  Derek Chiou,et al.  GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[4]  Zhao Haitao,et al.  Cross-layer framework for fine-grained channel access in next generation high-density WiFi networks , 2016 .

[5]  Henk Corporaal,et al.  The boat hull model: adapting the roofline model to enable performance prediction for parallel computing , 2012, PPoPP '12.

[6]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[8]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[9]  Mahmut T. Kandemir,et al.  Anatomy of GPU Memory System for Multi-Application Execution , 2015, MEMSYS.

[10]  Alexandra Fedorova,et al.  Managing Contention for Shared Resources on Multicore Processors , 2010 .

[11]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[12]  Xiaojin Zhu,et al.  Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Silvio Savarese,et al.  MEVBench: A mobile computer vision benchmarking suite , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Alex Zelinsky,et al.  Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.

[15]  Scott B. Baden,et al.  Modeling and predicting application performance on hardware accelerators , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[16]  Margaret Martonosi,et al.  Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[17]  Bradford Nichols,et al.  Pthreads programming - a POSIX standard for better multiprocessing , 1996 .

[18]  Steve Mann,et al.  OpenVIDIA: parallel GPU computer vision , 2005, ACM Multimedia.

[19]  Richard W. Vuduc,et al.  A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.

[20]  Joseph N. Wilson,et al.  A new SIMD computer vision architecture with image algebra programming environment , 1997, 1997 IEEE Aerospace Conference.

[21]  Rachata Ausavarungnirun,et al.  MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency , 2018, ASPLOS.

[22]  鈴木 勇介 Making GPUs first-class citizen computing resources in multi-tenant cloud environments(審査報告) , 2018 .

[23]  Erik R. Altman,et al.  Predicting GPU Performance from CPU Runs Using Machine Learning , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[26]  Nicolae Popovici,et al.  Putting intel® threading building blocks to work , 2008, IWMSE '08.

[27]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[28]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[29]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[30]  M. Ishikawa,et al.  A dynamically reconfigurable SIMD processor for a vision chip , 2003, IEEE Journal of Solid-State Circuits.

[31]  Sherali Zeadally,et al.  Offloading in fog computing for IoT: Review, enabling technologies, and research opportunities , 2018, Future Gener. Comput. Syst..

[32]  Ejaz Ahmed,et al.  The Role of Edge Computing in Internet of Things , 2018, IEEE Communications Magazine.

[33]  Yifan Yu,et al.  Mobile edge computing towards 5G: Vision, recent progress, and open challenges , 2016, China Communications.

[34]  Venkatram Vishwanath,et al.  GROPHECY: GPU performance projection from CPU code skeletons , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[35]  Lieven Eeckhout,et al.  Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics , 2006, 2006 IEEE International Symposium on Workload Characterization.

[36]  Bingsheng He,et al.  ThunderSVM: A Fast SVM Library on GPUs and CPUs , 2018, J. Mach. Learn. Res..