Performance and memory space optimizations for embedded systems
暂无分享,去创建一个
[1] Michael F. P. O'Boyle. A hierarchical locality algorithm for NUMA compilation , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.
[2] Christoforos E. Kozyrakis,et al. Comparing memory systems for chip multiprocessors , 2007, ISCA '07.
[3] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[4] Martin Hopkins,et al. A novel SIMD architecture for the cell heterogeneous chip-multiprocessor , 2005, 2005 IEEE Hot Chips XVII Symposium (HCS).
[5] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[6] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[7] Mo Chen,et al. The Importance of Data Compression for Energy Efficiency in Sensor Networks , 2003 .
[8] Mahmut T. Kandemir,et al. A Memory-Conscious Code Parallelization Scheme , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[9] Mahmut T. Kandemir,et al. An energy saving strategy based on adaptive loop parallelization , 2002, DAC '02.
[10] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[11] Luca Benini,et al. Hardware-assisted data compression for energy minimization in systems with embedded processors , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.
[12] Nectarios Koziris,et al. Automatic parallel code generation for tiled nested loops , 2004, SAC '04.
[13] Lin Gao,et al. Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[14] Irith Pomeranz,et al. Transient-Fault Recovery for Chip Multiprocessors , 2003, IEEE Micro.
[15] Yuan Xie,et al. Profile-Driven Selective Code Compression , 2003, DATE.
[16] Yuan Xie,et al. LZW-based code compression for VLIW embedded systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.
[17] Mahmut T. Kandemir,et al. SPM conscious loop scheduling for embedded chip multiprocessors , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).
[18] Andrew Wolfe,et al. Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture , 2000, MICRO 2000.
[19] CONSTANTINE D. POLYCHRONOPOULOS,et al. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.
[20] Laura Ricci,et al. Automatic loop parallelization: an abstract interpretation approach , 2002, Proceedings. International Conference on Parallel Computing in Electrical Engineering.
[21] Constantine D. Polychronopoulos,et al. Parallel programming and compilers , 1988 .
[22] Henk Sips,et al. A Unified Compiler Framework for Work and Data Placement , 2001 .
[23] Mahmut T. Kandemir,et al. Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems , 2002, CC.
[24] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[25] Mahmut T. Kandemir,et al. Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[26] Mahmut Kandemir,et al. SPM management using Markov chain based data access prediction , 2008, ICCAD 2008.
[27] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[28] Sang Lyul Min,et al. A dynamic code placement technique for scratchpad memory using postpass optimization , 2006, CASES '06.
[29] Yunheung Paek,et al. Software controlled memory layout reorganization for irregular array access patterns , 2007, CASES '07.
[30] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[31] Wayne H. Wolf,et al. SAMC: a code compression algorithm for embedded processors , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[32] Rajeev Barua,et al. Heap data allocation to scratch-pad memory in embedded systems , 2005, J. Embed. Comput..
[33] Erik Brockmeyer,et al. Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[34] Yijun Yu,et al. Loop Parallelization using the 3D Iteration Space Visualizer , 2001, J. Vis. Lang. Comput..
[35] Rajeev Barua,et al. Scratch-pad memory allocation without compiler support for java applications , 2007, CASES '07.
[36] Kiran Bondalapati. Parallelizing DSP nested loops on reconfigurable architectures using data context switching , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[37] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[38] Yuan Xie,et al. Code Compression for VLIW Processors , 2001, Data Compression Conference.
[39] Peter Marwedel,et al. Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.
[40] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[41] Giovanni De Micheli,et al. Synthesis and Optimization of Digital Circuits , 1994 .
[42] Kurt Keutzer,et al. Code density optimization for embedded DSP processors using data compression techniques , 1995, Proceedings Sixteenth Conference on Advanced Research in VLSI.
[43] Sanjay J. Patel,et al. Implicitly Parallel Programming Models for Thousand-Core Microprocessors , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[44] Rudy Lauwereins,et al. Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs , 2001, IEEE Des. Test Comput..
[45] Juan Touriño,et al. A GSA-based compiler infrastructure to extract parallelism from complex loops , 2003, ICS '03.
[46] Eftychios Sifakis,et al. Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors , 2007, ISCA '07.
[47] L.M. Ni,et al. Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..
[48] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .
[49] Sung-Mo Kang,et al. Effective algorithms for cache-level compression , 2001, GLSVLSI '01.
[50] Monica S. Lam,et al. Locality Optimizations for Parallel Machines , 1994, CONPAR.
[51] Mahmut T. Kandemir,et al. Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.
[52] Anoop Gupta,et al. Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.
[53] Evangelos P. Markatos,et al. Using Processor Affinity in Loop Scheduling , 1994 .
[54] Mahmut T. Kandemir,et al. Dynamic Scratch-Pad Memory Management for Irregular Array Access Patterns , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[55] Evangelos P. Markatos,et al. Load Balancing vs. Locality Management in Shared-Memory Multiprocessors , 1992, ICPP.
[56] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[57] Francky Catthoor,et al. Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access , 2005, Design, Automation and Test in Europe.
[58] Enrico Macii,et al. Architectural Leakage-Aware Management of Partitioned Scratchpad Memories , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.
[59] William Pugh,et al. The Omega Library interface guide , 1995 .
[60] Larry Carter,et al. On the Parallel Execution Time of Tiled Loops , 2003, IEEE Trans. Parallel Distributed Syst..
[61] Mats Brorsson,et al. Performance Impact of Code and Data Placement on the IBM RP3 , 1989 .
[62] Rick Hetherington. The UltraSPARC T 1 Processor-Power Efficient Throughput Computing , 2004 .
[63] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[64] Vincent Loechner,et al. Parametric Analysis of Polyhedral Iteration Spaces , 1998, J. VLSI Signal Process..
[65] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.
[66] Luca Benini,et al. An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..
[67] Keith D. Cooper,et al. Enhanced code compression for embedded RISC processors , 1999, PLDI '99.
[68] Mary Jane Irwin,et al. Integrated code and data placement in two-dimensional mesh based chip multiprocessors , 2008, ICCAD 2008.
[69] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.
[70] Chau-Wen Tseng,et al. An Overview of the SUIF Compiler for Scalable Parallel Machines , 1995, PPSC.
[71] Mahmut T. Kandemir,et al. Optimizing code parallelization through a constraint network based approach , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[72] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[73] Radu Marculescu,et al. Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[74] Edwin V. Bonilla,et al. Predicting best design trade-offs: A case study in processor customization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[75] T. Mudge,et al. Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[76] Chih-Ping Chu,et al. Exploitation of parallelism to nested loops with dependence cycles , 2004, J. Syst. Archit..
[77] Mahmut T. Kandemir. LODS: locality-oriented dynamic scheduling for on-chip multiprocessors , 2004, Proceedings. 41st Design Automation Conference, 2004..
[78] Saumya K. Debray,et al. Profile-guided code compression , 2002, PLDI '02.
[79] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[80] Mahmut T. Kandemir,et al. Integer linear programming based energy optimization for banked DRAMs , 2005, GLSVLSI '05.
[81] Nikil D. Dutt,et al. Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.
[82] Bo Hu,et al. Multilevel expansion-based VLSI placement with blockages , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..
[83] Mahmut T. Kandemir,et al. Exploiting shared scratch pad memory space in embedded multiprocessor systems , 2002, DAC '02.
[84] Wayne H. Wolf. The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..
[85] Hui Li,et al. Locality and Loop Scheduling on NUMA Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[86] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[87] Monica S. Lam,et al. Automatic computation and data decomposition for multiprocessors , 1997 .
[88] Brian Parker Tunstall,et al. Synthesis of noiseless compression codes , 1967 .
[89] Mahmut T. Kandemir,et al. Data compression for improving SPM behavior , 2004, Proceedings. 41st Design Automation Conference, 2004..
[90] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .
[91] Cheng Wang,et al. Impact of data compression on energy consumption of wireless-networked handheld devices , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..
[92] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.
[93] Enrico Macii,et al. A new algorithm for energy-driven data compression in VLIW embedded processors , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.
[94] Kunle Olukotun,et al. The Future of Microprocessors , 2005, ACM Queue.
[95] Rudy Lauwereins,et al. Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.
[96] Mahmut T. Kandemir,et al. Integrating loop and data optimizations for locality within a constraint network based framework , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..
[97] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA computers , 1993, TOCS.
[98] William J. Dally,et al. Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.
[99] David A. Padua,et al. Compiler Techniques for the Distribution of Data and Computation , 2003, IEEE Trans. Parallel Distributed Syst..
[100] Mahmut T. Kandemir,et al. Code Scheduling for Optimizing Parallelism and Data Locality , 2010, Euro-Par.
[101] Mahmut T. Kandemir,et al. Compiler-Directed Code Restructuring for Operating with Compressed Arrays , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).
[102] Heonshik Shin,et al. Scratchpad memory management for portable systems with a memory management unit , 2006, EMSOFT '06.
[103] Tarek S. Abdelrahman,et al. Automatic partitioning of data and computations on scalable shared memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[104] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[105] Isabelle Puaut,et al. Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison , 2007 .
[106] Kunle Olukotun,et al. A Single-Chip Multiprocessor , 1997, Computer.
[107] Carla Schlatter Ellis,et al. Power aware page allocation , 2000, SIGP.
[108] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .
[109] Xiaowei Shen,et al. Hardware Compressed Main Memory: Operating System Support and Performance Evaluation , 2001, IEEE Trans. Computers.
[110] Edith Schonberg,et al. Factoring: a method for scheduling parallel loops , 1992 .
[111] Michael F. P. O'Boyle,et al. Nonsingular Data Transformations: Definition, Validity, and Applications , 1999, International Journal of Parallel Programming.
[112] Fernando Gehm Moraes,et al. Exploring NoC mapping strategies: an energy and timing aware technique , 2005, Design, Automation and Test in Europe.
[113] Stephen Richardson. MPOC: A Chip Multiprocessor for Embedded Systems , 2002 .
[114] Li-Shiuan Peh,et al. Design-space exploration of power-aware on/off interconnection networks , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..
[115] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[116] Montserrat Ros,et al. Code compression based on operand-factorization for VLIW processors , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.
[117] Sharad Malik,et al. Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[118] Narayanan Vijaykrishnan,et al. Thermal-aware IP virtualization and placement for networks-on-chip architecture , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..
[119] Volodymyr Beletskyy,et al. An approach to parallelizing non-uniform loops with the Omega calculator , 2002, Proceedings. International Conference on Parallel Computing in Electrical Engineering.
[120] Gregory R. Andrews,et al. An adaptive approach to data placement , 1996, Proceedings of International Conference on Parallel Processing.
[121] Jun Yang,et al. Frequent value compression in data caches , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[122] Mahmut Kandemir,et al. Memory bank aware dynamic loop scheduling , 2007 .
[123] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.