Enhancing Address Translations in Throughput Processors via Compression
暂无分享,去创建一个
[1] Abhishek Bhattacharjee,et al. SEESAW: Using Superpages to Improve VIPT Caches , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[2] Mahmut T. Kandemir,et al. Enhancing computation-to-core assignment with physical location information , 2018, PLDI.
[3] Ján Veselý,et al. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[5] Mahmut T. Kandemir,et al. Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications , 2019, 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC).
[6] Abhishek Bhattacharjee,et al. Address Translation for Throughput-Oriented Accelerators , 2015, IEEE Micro.
[7] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.
[8] David A. Bader. Graph partitioning and graph clustering : 10th DIMACS Implementation Challenge Workshop, February 13-14, 2012, Georgia Institute of Technology, Atlanta, GA , 2013 .
[9] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[10] Rabi N. Mahapatra,et al. Dynamic Aggregation of Virtual Addresses in TLB Using TCAM Cells , 2008, 21st International Conference on VLSI Design (VLSID 2008).
[11] Rachata Ausavarungnirun,et al. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[13] Derek Hower,et al. TLB Shootdown Mitigation for Low-Power Many-Core Servers with L1 Virtual Caches , 2018, IEEE Computer Architecture Letters.
[14] Mahmut T. Kandemir,et al. Controlled Kernel Launch for Dynamic Parallelism in GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[15] Jaehyuk Huh,et al. Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[16] Li Shen,et al. Efficient Data Communication between CPU and GPU through Transparent Partial-Page Migration , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[17] Jeffrey S. Vetter,et al. A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..
[18] Luca Caucci,et al. GPU programming for biomedical imaging , 2015, SPIE Optical Engineering + Applications.
[19] Onur Mutlu,et al. Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[21] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[22] Abhishek Bhattacharjee,et al. Translation-Triggered Prefetching , 2017, ASPLOS.
[23] Rachata Ausavarungnirun,et al. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency , 2018, ASPLOS.
[24] Mahmut T. Kandemir,et al. Data Movement Aware Computation Partitioning , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Mahmut T. Kandemir,et al. Quantifying Data Locality in Dynamic Parallelism in GPUs , 2018, Proc. ACM Meas. Anal. Comput. Syst..
[26] Gabriel H. Loh,et al. Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[27] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[28] Ján Veselý,et al. Generic System Calls for GPUs , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[29] Luca Benini,et al. Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[30] Boris Grot,et al. Prefetched Address Translation , 2019, MICRO.
[31] Mahmut T. Kandemir,et al. Optimizing off-chip accesses in multicores , 2015, PLDI.
[32] Jason Cong,et al. Supporting Address Translation for Accelerator-Centric Architectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[33] Alan L. Cox,et al. Translation caching: skip, don't walk (the page table) , 2010, ISCA.
[34] Babak Falsafi,et al. Near-Memory Address Translation , 2016, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[35] Sudhakar Yalamanchili,et al. Power Modeling for GPU Architectures Using McPAT , 2014, TODE.
[36] Mahmut T. Kandemir,et al. Scheduling techniques for GPU architectures with processing-in-memory capabilities , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[37] Frank Bellosa,et al. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping , 2015, VEE.
[38] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[39] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.
[40] Mahmut T. Kandemir,et al. Opportunistic Computing in GPU Architectures , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[41] Abhishek Bhattacharjee,et al. Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] Tianhao Zhang,et al. Do-it-yourself virtual memory translation , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[43] Alan L. Cox,et al. SpecTLB: A mechanism for speculative address translation , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[44] Osman S. Unsal,et al. Redundant Memory Mappings for fast access to large memories , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[45] Yan Solihin,et al. Scheduling Page Table Walks for Irregular GPU Applications , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[46] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.
[47] Mahmut T. Kandemir,et al. Memory Row Reuse Distance and its Role in Optimizing Application Performance , 2015, SIGMETRICS 2015.
[48] Sunggu Lee,et al. Memory fast-forward: A low cost special function unit to enhance energy efficiency in GPU for big data processing , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[49] Keshav Pingali,et al. Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[50] Thomas F. Wenisch,et al. Unlocking bandwidth for GPUs in CC-NUMA systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[51] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[52] Mahmut T. Kandemir,et al. Improving bank-level parallelism for irregular applications , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[53] Michael M. Swift,et al. Devirtualizing Memory in Heterogeneous Systems , 2018, ASPLOS.
[54] Mahmut T. Kandemir,et al. Computing with Near Data , 2018, Proc. ACM Meas. Anal. Comput. Syst..
[55] Rami G. Melhem,et al. Interplay between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[56] Mark Silberstein,et al. ActivePointers: A Case for Software Address Translation on GPUs , 2018, OPSR.
[57] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[58] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[59] Abhishek Bhattacharjee,et al. Efficient Address Translation for Architectures with Multiple Page Sizes , 2017, ASPLOS.
[60] Mahmut T. Kandemir,et al. Co-optimizing memory-level parallelism and cache-level parallelism , 2019, PLDI.
[61] Mohan Kumar,et al. LATR: Lazy Translation Coherence , 2018, ASPLOS.
[62] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[63] Youngjin Kwon,et al. Coordinated and Efficient Huge Page Management with Ingens , 2016, OSDI.
[64] Zi Yan,et al. Translation Ranger: Operating System Support for Contiguity-Aware TLBs , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[65] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[66] David W. Nellans,et al. Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).