PTask: operating system abstractions to manage GPUs as compute devices
暂无分享,去创建一个
Mark Silberstein | Baishakhi Ray | Emmett Witchel | Christopher J. Rossbach | Jon Currey | C. Rossbach | J. Currey | M. Silberstein | E. Witchel | Baishakhi Ray | Emmett Witchel
[1] Edward A. Lee,et al. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.
[2] Calton Pu,et al. Threads and input/output in the synthesis kernal , 1989, SOSP '89.
[3] Pascal Raymond,et al. The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.
[4] James C. Browne,et al. The CODE 2.0 graphical parallel programming language , 1992, ICS '92.
[5] Gérard Berry,et al. The Esterel Synchronous Programming Language: Design, Semantics, Implementation , 1992, Sci. Comput. Program..
[6] Larry L. Peterson,et al. Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.
[7] Joseph Pasquale,et al. Container shipping: operating system support for I/O-intensive applications , 1994, Computer.
[8] Yousef A. Khalidi,et al. An Efficient Zero-Copy I/O Framework for UNIX , 1995 .
[9] Brian N. Bershad,et al. Extensibility safety and performance in the SPIN operating system , 1995, SOSP.
[10] Larry L. Peterson,et al. Making paths explicit in the Scout operating system , 1996, OSDI '96.
[11] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[12] David A. Patterson,et al. A case for intelligent disks (IDISKs) , 1998, SGMD.
[13] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1998, ISCA '98.
[14] Orlando Loques,et al. P-RIO: a modular parallel-programming environment , 1998, IEEE Concurr..
[15] Roberto Manduchi,et al. Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
[16] Eddie Kohler,et al. The Click modular router , 1999, SOSP.
[17] Willy Zwaenepoel,et al. IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.
[18] John Wawrzynek,et al. Stream Computations Organized for Reconfigurable Execution (SCORE) , 2000, FPL.
[19] Christos Faloutsos,et al. Active Disks for Large-Scale Data Processing , 2001, Computer.
[20] Michael Linetsky,et al. Programming Microsoft Directshow , 2001 .
[21] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[22] William J. Dally,et al. The Imagine Stream Processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[23] Prithviraj Banerjee,et al. Static array storage optimization in MATLAB , 2003, PLDI '03.
[24] Andy Currid,et al. TCP Offload to the Rescue , 2004, ACM Queue.
[25] Larry Carter,et al. Scheduling strategies for master-slave tasking on heterogeneous processor platforms , 2004, IEEE Transactions on Parallel and Distributed Systems.
[26] S. Burak Gokturk,et al. A Time-Of-Flight Depth Sensor - System Description, Issues and Solutions , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.
[27] Mahmut T. Kandemir,et al. Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation , 2004, J. Parallel Distributed Comput..
[28] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[29] Dinesh Manocha,et al. Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.
[30] Jesús Labarta,et al. Programming Grid Applications with GRID Superscalar , 2003, Journal of Grid Computing.
[31] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[32] Michael D. McCool,et al. Programming using RapidMind on the Cell BE , 2006, SC.
[33] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[34] Shan Shan Huang,et al. Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary , 2008, ECOOP.
[35] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[36] Bingsheng He,et al. Relational joins on graphics processors , 2008, SIGMOD Conference.
[37] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[38] Yao Zhang,et al. Parallel Computing Experiences with CUDA , 2008, IEEE Micro.
[39] Michael Kistler,et al. Accelerating computing with the cell broadband engine processor , 2008, Conf. Computing Frontiers.
[40] Muli Ben-Yehuda,et al. Tapping into the fountain of CPUs: on operating system support for programmable devices , 2008, ASPLOS.
[41] Michael J. Black,et al. Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia , 2008, Journal of neural engineering.
[42] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[43] Tarek S. Abdelrahman,et al. hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.
[44] Galen C. Hunt,et al. Helios: heterogeneous multiprocessing with satellite kernels , 2009, SOSP '09.
[45] Scott A. Mahlke,et al. Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[46] Cédric Augonnet,et al. Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System , 2009, SAMOS.
[47] Douglas Lanman,et al. BiDi screen: a thin, depth-sensing LCD for 3D interaction using light fields , 2009, SIGGRAPH 2009.
[48] Mircea Andrecut,et al. Parallel GPU Implementation of Iterative PCA Algorithms , 2008, J. Comput. Biol..
[49] Grigori Fursin,et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.
[50] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[51] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[52] Michael Chu,et al. Scientific and Engineering Computing Using ATI Stream Technology , 2009, Computing in Science & Engineering.
[53] Sangjin Han,et al. PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.
[54] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS 2010.
[55] Joshua S. Auerbach,et al. Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.
[56] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[57] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[58] Scott A. Mahlke,et al. Sponge: portable stream programming on graphics engines , 2011, ASPLOS XVI.
[59] Seungyeop Han,et al. SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.