Partitioning and Scheduling DSP Applications with Maximal Memory Access Hiding
暂无分享,去创建一个
[1] Michel Dubois,et al. Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[2] Naraig Manjikian,et al. Combining loop fusion with prefetching on shared-memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[3] Edward S. Davidson,et al. Register requirements of pipelined processors , 1992, ICS '92.
[4] Edwin Hsing-Mean Sha,et al. Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching , 2001, J. VLSI Signal Process..
[5] Jacqueline Chame,et al. A tile selection algorithm for data locality and cache interference , 1999, ICS '99.
[6] Edwin Hsing-Mean Sha,et al. Scheduling of uniform multidimensional systems under resource constraints , 1998, IEEE Trans. Very Large Scale Integr. Syst..
[7] Ken Kennedy,et al. Automatic Data Layout Using 0-1 Integer Programming , 1994, IFIP PACT.
[8] Chau-Wen Tseng,et al. Eliminating conflict misses for high performance architectures , 1998, ICS '98.
[9] Edwin Hsing-Mean Sha,et al. Schedule-based multi-dimensional retiming on data flow graphs , 1994, Proceedings of 8th International Parallel Processing Symposium.
[10] Anant Agarwal,et al. Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[11] Yves Robert,et al. (Pen)-ultimate tiling? , 1994, Integr..
[12] Seung Ryoul Maeng,et al. An adaptive sequential prefetching scheme in shared-memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[13] Edwin Hsing-Mean Sha,et al. Loop scheduling and partitions for hiding memory latencies , 1999, Proceedings 12th International Symposium on System Synthesis.
[14] Edwin Hsing-Mean Sha,et al. Optimizing DSP flow graphs via schedule-based multidimensional retiming , 1996, IEEE Trans. Signal Process..
[15] Todd C. Mowry,et al. Tolerating latency in multiprocessors through compiler-inserted prefetching , 1998, TOCS.
[16] Tien-Fu Chen,et al. Data prefetching for high-performance processors , 1993 .
[17] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[18] V. van Dongen,et al. Uniformization of linear recurrence equations: a step toward the automatic synthesis of systolic arrays , 1988, [1988] Proceedings. International Conference on Systolic Arrays.
[19] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).