Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications
暂无分享,去创建一个
Nikil D. Dutt | Sudeep Pasricha | Luis Angel D. Bathen | Yongjin Ahn | S. Pasricha | N. Dutt | Yongjin Ahn | L. A. Bathen
[1] Krzysztof Kuchcinski,et al. A constructive algorithm for memory-aware task assignment and scheduling , 2001, CODES '01.
[2] Shuvra S. Bhattacharyya,et al. The pipeline decomposition tree:: an analysis tool for multiprocessor implementation of image processing applications , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[3] Sri Parameswaran,et al. Design Methodology for Pipelined Heterogeneous Multiprocessor System , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[4] D. Gajski,et al. Hardware/software Partitioning And Pipelining , 1997, Proceedings of the 34th Design Automation Conference.
[5] Kurt Keutzer,et al. Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling , 2007, 2007 IEEE International Conference on Multimedia and Expo.
[6] Nikil D. Dutt,et al. A framework for memory-aware multimedia application mapping on chip-multiprocessors , 2008, 2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.
[7] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[8] Mahmut T. Kandemir,et al. Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[9] Peter Marwedel,et al. Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.
[10] Sri Parameswaran,et al. Heterogeneous multiprocessor implementations for JPEG:: a case study , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[11] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[12] Nikil D. Dutt,et al. Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.
[13] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[14] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.
[15] Yunheung Paek,et al. Compiler driven data layout optimization for regular/irregular array access patterns , 2008, LCTES '08.
[16] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[17] Erik Brockmeyer,et al. Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[18] Soonhoi Ha,et al. Pipelined data parallel task mapping/scheduling technique for MPSoC , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[19] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[20] Paul M. Chau,et al. Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems , 1995, IEEE Trans. Signal Process..
[21] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[22] Jong-Hwan Kim,et al. Quantum-inspired evolutionary algorithm for a class of combinatorial optimization , 2002, IEEE Trans. Evol. Comput..
[23] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[24] Scott A. Mahlke,et al. Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[25] Vivek Sarkar,et al. Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .
[26] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[27] Christian Steger,et al. Rapid exploration of multimedia system-on-chips with automatically generated software performance models , 2008, 2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.
[28] Kiyoung Choi,et al. SoCDAL: System-on-chip design AcceLerator , 2008, TODE.
[29] Erik Brockmeyer,et al. Layer assignment techniques for low power in multi-layered memory organisations. , 2003 .
[30] Erik Brockmeyer,et al. Data reuse analysis technique for software-controlled memory hierarchies , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.
[31] Tulika Mitra,et al. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.
[32] Ranga Vemuri,et al. RECOD: a retiming heuristic to optimize resource and memory utilization in HW/SW codesigns , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).
[33] Nikil D. Dutt,et al. FORAY-GEN: automatic generation of affine functions for memory optimizations , 2005, Design, Automation and Test in Europe.
[34] Erik Brockmeyer,et al. Layer assignment techniques for low energy in multi-layered memory organisations , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.