VersaPipe: A Versatile Programming Framework for Pipelined Computing on GPU
暂无分享,去创建一个
Wenguang Chen | Jidong Zhai | Xipeng Shen | Youngmin Yi | Zhen Zheng | Chanyoung Oh | Zhen Zheng | Wenguang Chen | Xipeng Shen | Youngmin Yi | Chanyoung Oh | Jidong Zhai
[1] Dong Li,et al. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.
[2] John Kim,et al. Automatically exploiting implicit Pipeline Parallelism from multiple dependent kernels for GPUs , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[3] Alexei A. Efros,et al. What makes Paris look like Paris? , 2015, Commun. ACM.
[4] Long Chen,et al. Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[5] Guoyang Chen,et al. Free launch: Optimizing GPU dynamic kernel launches through thread reuse , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Pat Hanrahan,et al. GRAMPS: A programming model for graphics pipelines , 2009, ACM Trans. Graph..
[7] J. Hess,et al. Calculation of potential flow about arbitrary bodies , 1967 .
[8] Malcolm Kesson. Pixar's RenderMan , 2008, SIGGRAPH Asia '08.
[9] Anjul Patney,et al. Piko: a framework for authoring programmable graphics pipelines , 2015, ACM Trans. Graph..
[10] Dejan S. Milojicic,et al. KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[12] Kai Li,et al. Characteristics of workloads using the pipeline programming model , 2010, ISCA'10.
[13] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[14] Robert G. Gallager,et al. Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.
[15] Dieter Schmalstieg,et al. Softshell , 2012, ACM Transactions on Graphics.
[16] Dieter Schmalstieg,et al. Whippletree , 2014, ACM Trans. Graph..
[17] Zhen Lin,et al. Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .
[19] Matti Pietikäinen,et al. Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Edward H. Adelson,et al. PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .
[21] Oscar C. Au,et al. Video Coding on Multicore Graphics Processors , 2010, IEEE Signal Processing Magazine.
[22] Mike O'Connor,et al. Divergence-Aware Warp Scheduling , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Yunsong Li,et al. A GPU-Accelerated Wavelet Decompression System With SPIHT and Reed-Solomon Decoding for Satellite Images , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
[24] Satoshi Takahashi,et al. Parallel implementation of saliency maps for real-time robot vision , 2014, 2014 14th International Conference on Control, Automation and Systems (ICCAS 2014).
[25] Youngmin Yi,et al. Real-time face detection in Full HD images exploiting both embedded CPU and GPU , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).
[26] Sudhakar Yalamanchili,et al. Characterization and analysis of dynamic parallelism in unstructured GPU applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[27] Robert Ricci,et al. Fast and flexible: Parallel packet processing with GPUs and click , 2013, Architectures for Networking and Communications Systems.
[28] Anjul Patney,et al. Task management for irregular-parallel workloads on the GPU , 2010, HPG '10.
[29] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .
[30] Jeffrey F. Naughton,et al. Multiprocessor Main Memory Transaction Processing , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.
[31] Wenguang Chen,et al. Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures , 2017, IEEE Transactions on Parallel and Distributed Systems.
[32] William J. Dally,et al. Imagine: Media Processing with Streams , 2001, IEEE Micro.
[33] L CookRobert,et al. The Reyes image rendering architecture , 1987 .
[34] Philippas Tsigas,et al. On dynamic load balancing on graphics processors , 2008, GH '08.
[35] Pat Hanrahan,et al. Ray tracing on a connection machine , 1988, ICS '88.
[36] Florence March,et al. 2016 , 2016, Affair of the Heart.
[37] Timo Aila,et al. Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.
[38] HanrahanPat,et al. Ray tracing on programmable graphics hardware , 2002 .
[39] Robert L. Cook,et al. The Reyes image rendering architecture , 1987, SIGGRAPH.
[40] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[41] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[42] John Kim,et al. iPAWS: Instruction-issue pattern-based adaptive warp scheduling for GPGPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[43] Timo Aila,et al. Megakernels considered harmful: wavefront path tracing on GPUs , 2013, HPG '13.
[44] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[45] Alexei A. Efros,et al. What makes Paris look like Paris? , 2015, Commun. ACM.
[46] Keshav Pingali,et al. A compiler for throughput optimization of graph algorithms on GPUs , 2016, OOPSLA.
[47] Kevin Skadron,et al. Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[48] Mahmut T. Kandemir,et al. Neither more nor less: Optimizing thread-level parallelism for GPGPUs , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[49] Mahmut T. Kandemir,et al. Orchestrated scheduling and prefetching for GPGPUs , 2013, ISCA.
[50] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[51] David K. McAllister,et al. OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..
[52] Mahmut T. Kandemir,et al. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.