MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining
暂无分享,去创建一个
[1] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.
[2] Nikil D. Dutt,et al. A framework for memory-aware multimedia application mapping on chip-multiprocessors , 2008, 2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.
[3] Ranga Vemuri,et al. RECOD: a retiming heuristic to optimize resource and memory utilization in HW/SW codesigns , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).
[4] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[5] Mahmut T. Kandemir,et al. Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[6] Peter Marwedel,et al. Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.
[7] Paul M. Chau,et al. Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems , 1995, IEEE Trans. Signal Process..
[8] Krzysztof Kuchcinski,et al. A constructive algorithm for memory-aware task assignment and scheduling , 2001, CODES '01.
[9] Jing-Chiou Liou,et al. An Efficient Task Clustering Heuristic for Scheduling DAGs on Multiprocessors , 2007 .
[10] Erik Brockmeyer,et al. Data reuse analysis technique for software-controlled memory hierarchies , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.
[11] Tulika Mitra,et al. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.
[12] Tao Yang,et al. Clustering task graphs for message passing architectures , 1990, ICS '90.
[13] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[14] Shuvra S. Bhattacharyya,et al. The pipeline decomposition tree:: an analysis tool for multiprocessor implementation of image processing applications , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[15] Miodrag Potkonjak,et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[16] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[17] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[18] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[19] Jong-Hwan Kim,et al. Quantum-inspired evolutionary algorithm for a class of combinatorial optimization , 2002, IEEE Trans. Evol. Comput..
[20] Nikil D. Dutt,et al. FORAY-GEN: automatic generation of affine functions for memory optimizations , 2005, Design, Automation and Test in Europe.
[21] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[22] Yunheung Paek,et al. Compiler driven data layout optimization for regular/irregular array access patterns , 2008, LCTES '08.
[23] Erik Brockmeyer,et al. Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[24] Soonhoi Ha,et al. Pipelined data parallel task mapping/scheduling technique for MPSoC , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[25] Erik Brockmeyer,et al. Layer assignment techniques for low energy in multi-layered memory organisations , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.
[26] Sri Parameswaran,et al. Design Methodology for Pipelined Heterogeneous Multiprocessor System , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[27] Kurt Keutzer,et al. Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling , 2007, 2007 IEEE International Conference on Multimedia and Expo.
[28] Scott A. Mahlke,et al. Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[29] Vivek Sarkar,et al. Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .
[30] Nikil D. Dutt,et al. Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications , 2009, 2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia.
[31] Hiroyuki Tomiyama,et al. CHStone: A benchmark program suite for practical C-based high-level synthesis , 2008, 2008 IEEE International Symposium on Circuits and Systems.
[32] Sri Parameswaran,et al. Heterogeneous multiprocessor implementations for JPEG:: a case study , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[33] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[34] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.
[35] Daniel Gajski,et al. Hardware/software partitioning and pipelining , 1997, DAC.
[36] M. Cosnard,et al. Clustering Task Graphs for Message Passing Architectures , 1990 .
[37] Nikil D. Dutt,et al. Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.
[38] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[39] Kiyoung Choi,et al. SoCDAL: System-on-chip design AcceLerator , 2008, TODE.