Automatic dataflow application tuning for heterogeneous systems
暂无分享,去创建一个
[1] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[2] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[3] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[4] Frederick Reiss,et al. TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.
[5] Jack B. Dennis,et al. Data Flow Supercomputers , 1980, Computer.
[6] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Gregory Diamos,et al. Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.
[8] Hiroshi Watanabe,et al. Divisible Load Scheduling with Result Collection on Heterogeneous Systems , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[9] Karsten Schwan,et al. ACDS: Adapting computational data streams for high performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[10] Eduard Ayguadé,et al. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.
[11] Jun Kong,et al. Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development , 2009, Pattern Recognit..
[12] Kevin Skadron,et al. Experiences Accelerating MATLAB Systems Biology Applications , 2009 .
[13] Jaspal Subhlok,et al. Optimal latency-throughput tradeoffs for data parallel pipelines , 1996, SPAA '96.
[14] Cynthia A. Phillips,et al. Scheduling DAGs on asynchronous processors , 2007, SPAA '07.
[15] Jack J. Dongarra,et al. Decision Trees and MPI Collective Algorithm Selection Problem , 2007, Euro-Par.
[16] Conor McBride. Clowns to the left of me, jokers to the right (pearl): dissecting data structures , 2008, POPL '08.
[17] Galen C. Hunt,et al. The Coign automatic distributed partitioning system , 1999, OSDI '99.
[18] Lúcia Maria de A. Drummond,et al. Anthill: a scalable run-time environment for data mining applications , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).
[19] Robert Strzodka,et al. Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid , 2011, IEEE Transactions on Parallel and Distributed Systems.
[20] Ümit V. Çatalyürek,et al. Run-time optimizations for replicated dataflows on heterogeneous environments , 2010, HPDC '10.
[21] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[22] Noah Treuhaft,et al. Cluster I/O with River: making the fast case common , 1999, IOPADS '99.
[23] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[24] Yves Robert,et al. Introduction to Scheduling , 2009, CRC computational science series.
[25] Joel H. Saltz,et al. Distributed processing of very large datasets with DataCutter , 2001, Parallel Comput..
[26] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[27] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[28] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[29] Umakishore Ramachandran,et al. Capsules: Expressing Composable Computations in a Parallel Programming Model , 2007, LCPC.
[30] Ümit V. Çatalyürek,et al. Investigating the use of GPU-accelerated nodes for SAR image formation , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[31] Jorge J. Moré,et al. Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .