Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers
暂无分享,去创建一个
Quan Chen | Lingjia Tang | Hailong Yang | Jason Mars | Lingjia Tang | Jason Mars | Quan Chen | Hailong Yang
[1] Chung Laung Liu,et al. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.
[2] George A. F. Seber,et al. Linear regression analysis , 1977 .
[3] Kathleen Martin,et al. The Learning Machines. , 1981 .
[4] Shirish S. Sathaye,et al. Generalized rate-monotonic scheduling theory: a framework for developing real-time systems , 1994, Proc. IEEE.
[5] Sunil Arya,et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.
[6] Luigi V. Mancini,et al. Fault-Tolerant Rate-Monotonic First-Fit Scheduling in Hard-Real-Time Systems , 1999, IEEE Trans. Parallel Distributed Syst..
[7] Sabine Buchholz,et al. Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.
[8] Ibm Redbooks,et al. Workload Management With Loadleveler , 2001 .
[9] Mark J. Clement,et al. Core Algorithms of the Maui Scheduler , 2001, JSSPP.
[10] Dror G. Feitelson,et al. Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..
[11] Luc Van Gool,et al. SURF: Speeded Up Robust Features , 2006, ECCV.
[12] Alexander I. Rudnicky,et al. Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[13] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[14] John Ayer,et al. Understanding Performance of PCI Express Systems , 2008 .
[15] Rafael Mayo,et al. Evaluation and tuning of the Level 3 CUBLAS for graphics processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[16] Michel Barlaud,et al. Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[17] Francisco J. Cazorla,et al. Software-Controlled Priority Characterization of POWER5 Processor , 2008, 2008 International Symposium on Computer Architecture.
[18] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[19] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
[20] Hui Xiong,et al. High-dimensional kNN joins with incremental updates , 2010, GeoInformatica.
[21] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.
[22] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[23] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[24] James H. Anderson,et al. Globally scheduled real-time multiprocessor systems with GPUs , 2011, Real-Time Systems.
[25] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[26] Jens Breitbart,et al. Analysis of a Memory Bandwidth Limited Scenario for NUMA and GPU Systems , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[27] Marti A. Hearst. 'Natural' search user interfaces , 2011, CACM.
[28] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29] Srimat T. Chakradhar,et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework , 2011, HPDC '11.
[30] Wei Wang,et al. Performance analysis of thread mappings with a holistic view of the hardware resources , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[31] Lingjia Tang,et al. Compiling for niceness: mitigating contention for QoS in warehouse scale computers , 2012, CGO '12.
[32] Trevor Hastie,et al. An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.
[33] Lingjia Tang,et al. Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.
[34] Margaret Martonosi,et al. Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[35] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.
[36] Mohamed Hefeeda,et al. Dynamic Sharing of GPUs in Cloud Systems , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[37] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.
[38] James H. Anderson,et al. GPUSync: A Framework for Real-Time GPU Management , 2013, 2013 IEEE 34th Real-Time Systems Symposium.
[39] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[40] Michael B. Miller. Linear Regression Analysis , 2013 .
[41] Vincent Loechner,et al. Adaptive Runtime Selection for GPU , 2013, 2013 42nd International Conference on Parallel Processing.
[42] Xiang Wang,et al. A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs , 2013, HPDC '13.
[43] Lingjia Tang,et al. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[44] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[45] Yi Yang,et al. Warp-level divergence in GPUs: Characterization, impact, and mitigation , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[46] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[47] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[48] Mohammad Abdullah Al Faruque,et al. GPU-EvR: Run-time event based real-time scheduling framework on GPGPU platform , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[49] Scott A. Mahlke,et al. Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[50] Mahmut T. Kandemir,et al. Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[51] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[52] Lingjia Tang,et al. Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[53] Nam Sung Kim,et al. QoS-aware dynamic resource allocation for spatial-multitasking GPUs , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[54] Quan Chen,et al. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[55] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[56] Thomas F. Wenisch,et al. Unlocking bandwidth for GPUs in CC-NUMA systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[57] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[58] Kia Bazargan,et al. Axilog: Abstractions for Approximate Hardware Design and Reuse , 2015, IEEE Micro.
[59] Ronald G. Dreslinski,et al. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.
[60] Guoyang Chen,et al. Enabling Portable Optimizations of Data Placement on GPU , 2015, IEEE Micro.
[61] Daniel Mossé,et al. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[62] Eric S. Chung,et al. A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[63] Daniel Gillblad,et al. Learning Machines , 2020, AAAI Spring Symposia.