论文信息 - Resource Allocation for Software Pipelines in Many-core Systems

Resource Allocation for Software Pipelines in Many-core Systems

xxvii Zusammenfassung und Übersicht der Arbeit xxix

[1] Lothar Thiele,et al. MAMOT: Memory-Aware Mapping Optimization Tool for MPSoC , 2012, 2012 15th Euromicro Conference on Digital System Design.

[2] Natalie D. Enright Jerger,et al. Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[3] Radu Marculescu,et al. Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4] Anand Raghunathan,et al. Automatic generation of software pipelines for heterogeneous parallel systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[5] Jürgen Teich,et al. Mapping of applications to MPSoCs , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6] M TullsenDean,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[7] Amit Kumar Singh,et al. Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[8] William Landi,et al. Undecidability of static analysis , 1992, LOPL.

[9] Jonathan M. Smith,et al. A survey of process migration mechanisms , 1988, OPSR.

[10] Lothar Thiele,et al. Scenario-based design flow for mapping streaming applications onto on-chip many-core systems , 2012, CASES '12.

[11] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.

[12] Timothy G. Mattson,et al. Light-weight communications on Intel's single-chip cloud computer processor , 2011, OPSR.

[13] Dean M. Tullsen,et al. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[14] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[15] Coniferous softwood. GENERAL TERMS , 2003 .

[16] Sahin Albayrak,et al. Mobility-based Runtime Load Balancing in Multi-Agent Systems , 2006, SEKE.

[17] Teofilo F. Gonzalez,et al. P-Complete Approximation Problems , 1976, J. ACM.

[18] Todor Stefanov,et al. Modeling adaptive streaming applications with Parameterized Polyhedral Process Networks , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[19] W. Dally,et al. Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[20] William Thies,et al. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[21] Andrea Acquaviva,et al. Assessing Task Migration Impact on Embedded Soft Real-Time Streaming Multimedia Applications , 2008, EURASIP J. Embed. Syst..

[22] Norman P. Jouppi,et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[23] Narayanan Vijaykrishnan,et al. Accelerating neuromorphic vision algorithms for recognition , 2012, DAC Design Automation Conference 2012.

[24] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[25] Amnon Barak,et al. Scalable Cluster Computing with MOSIX for LINUX , 1999 .

[26] Christian Müller-Schloer,et al. Organic computing: on the feasibility of controlled emergence , 2004, CODES+ISSS '04.

[27] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28] Narayanan Vijaykrishnan,et al. Reliability concerns in embedded system designs , 2006, Computer.

[29] Karam S. Chatha,et al. A lightweight run-time scheduler for multitasking multicore stream applications , 2010, 2010 IEEE International Conference on Computer Design.

[30] Shekhar Y. Borkar,et al. Design challenges of technology scaling , 1999, IEEE Micro.

[31] Gilles Kahn,et al. The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[32] Mahmut T. Kandemir,et al. Cooperative parallelization , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[33] Marco D. Santambrogio,et al. The Autonomic Operating System research project - Achievements and future directions , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[34] Shekhar Y. Borkar,et al. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[35] David Z. Pan,et al. A3MAP: architecture-aware analytic mapping for networks-on-chip , 2010, ASP-DAC 2010.

[36] Vincent David,et al. A low-overhead dedicated execution support for stream applications on shared-memory cmp , 2012, EMSOFT '12.

[37] Swaroop Sridhar,et al. An approach to heterogeneous process state capture/recovery to achieve minimum performance overhead during normal execution , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[38] Frédéric Pétrot,et al. Cost-efficient buffer sizing in shared-memory 3D-MPSoCs using wide I/O interfaces , 2012, DAC Design Automation Conference 2012.

[39] Hui Xu,et al. Development of low power many-core SoC for multimedia applications , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[40] Brian T. Lewis,et al. Thread Scheduling for Multi-Core Platforms , 2007, HotOS.

[41] Petr Jan Horn,et al. Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .

[42] Santanu Chattopadhyay,et al. Application Mapping onto Mesh Structured Network-on-Chip Using Particle Swarm Optimization , 2011, 2011 IEEE Computer Society Annual Symposium on VLSI.

[43] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[44] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[45] Jörg Henkel,et al. Pipelets: Self-organizing software Pipelines for many-core architectures , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[46] Li Shang,et al. Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[47] Hsien-Hsin S. Lee,et al. Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.

[48] Thomas Serre,et al. Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex , 2004 .

[49] Karam S. Chatha,et al. Unrolling and retiming of stream applications onto embedded multicore processors , 2012, DAC Design Automation Conference 2012.

[50] Kevin Klues,et al. Processes and Resource Management in a Scalable Many-core OS ∗ , 2010 .

[51] Victor Pankratius,et al. AutoTunium: An Evolutionary Tuner for General-Purpose Multicore Applications , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[52] Saurabh Dighe,et al. The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[53] Raphael A. Finkel,et al. Designing a process migration facility: the Charlotte experience , 1989, Computer.

[54] Fernando Gehm Moraes,et al. Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).

[55] Jörg Henkel,et al. Work in Progress: Malleable Software Pipelines for Efficient Many-core System Utilization , 2012, MARC Symposium.

[56] Julie A. McCann,et al. A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.

[57] Soonhoi Ha,et al. Executing synchronous dataflow graphs on a SPM-based multicore architecture , 2012, DAC Design Automation Conference 2012.

[58] Michael Engel,et al. Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms , 2012, CODES+ISSS '12.

[59] Todor Stefanov,et al. Managing latency in embedded streaming applications under hard-real-time scheduling , 2012, CODES+ISSS '12.

[60] Jungwon Kim,et al. An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[61] Michael Hitchens,et al. A new process migration algorithm , 1997, OPSR.

[62] Ioana Burcea,et al. A compiler and runtime for heterogeneous computing , 2012, DAC Design Automation Conference 2012.

[63] Mahmut T. Kandemir,et al. Compiler-directed application mapping for NoC based chip multiprocessors , 2007, LCTES '07.

[64] Jeffrey O. Kephart,et al. The Vision of Autonomic Computing , 2003, Computer.

[65] Michael J. Donahoo,et al. TCP / IP sockets in C# - practical guide for programmers , 2004, The Morgan Kaufmann practical guides series.

[66] Susan Horwitz,et al. Precise flow-insensitive may-alias analysis is NP-hard , 1997, TOPL.

[67] Sri Parameswaran,et al. Fine-grained hardware/software methodology for process migration in MPSoCs , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[68] Davide Bertozzi,et al. Supporting Task Migration in Multi-Processor Systems-on-Chip: A Feasibility Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[69] Jörg Henkel,et al. MOMA: Mapping of memory-intensive software-pipelined applications for systems with multiple memory controllers , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[70] Eduard Ayguadé,et al. Hardware–Software Coherence Protocol for the Coexistence of Caches and Local Memories , 2012, IEEE Transactions on Computers.

[71] Nikil D. Dutt,et al. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories , 2012, DAC Design Automation Conference 2012.

[72] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[73] Hartmut Schmeck,et al. Organic Computing - A New Vision for Distributed Embedded Systems , 2005, ISORC.

[74] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[75] Glenn Leary,et al. System-level synthesis of memory architecture for stream processing sub-systems of a MPSoC , 2012, DAC Design Automation Conference 2012.

[76] Jörg Henkel,et al. CARAT: Context-aware runtime adaptive task migration for multi core architectures , 2011, 2011 Design, Automation & Test in Europe.

[77] Rainer Leupers,et al. Communication-aware mapping of KPN applications onto heterogeneous MPSoCs , 2012, DAC Design Automation Conference 2012.

[78] Jörg Henkel,et al. Optimizations for configuring and mapping software pipelines in many core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[79] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.

[80] Edward A. Lee,et al. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[81] Tong Li,et al. Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[82] George Kurian,et al. Self-aware computing in the Angstrom processor , 2012, DAC Design Automation Conference 2012.

[83] Manuel E. Acacio,et al. Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[84] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Platforms with Memory Heterogeneity , 2010, Euro-Par Workshops.

[85] Mor Harchol-Balter,et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[86] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[87] Chita R. Das,et al. A heterogeneous multiple network-on-chip design: An application-aware approach , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[88] Rudy Lauwereins,et al. Infrastructure for design and management of relocatable tasks in a heterogeneous reconfigurable system-on-chip , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[89] Théodore Marescaux,et al. Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles , 2005, Design, Automation and Test in Europe.

[90] Luís Nogueira,et al. Server-based scheduling of parallel real-time tasks , 2012, EMSOFT '12.

[91] Christoph W. Kessler,et al. Investigation of main memory bandwidth on Intel Single-Chip Cloud Computer , 2011, MARC Symposium.

[92] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[93] John P. Lehoczky,et al. Partitioned Fixed-Priority Preemptive Scheduling for Multi-core Processors , 2009, 2009 21st Euromicro Conference on Real-Time Systems.

[94] Rainer Leupers,et al. MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[95] Karam S. Chatha,et al. Dynamic scheduling of stream programs on embedded multi-core processors , 2012, CODES+ISSS '12.

[96] Natalie D. Enright Jerger,et al. Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[97] Carl D. Offner,et al. TStreams : A Model of Parallel Computation ( Preliminary Report ) , .

[98] Wolfgang Schröder-Preikschat,et al. DistRM: Distributed resource management for on-chip many-core systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[99] Luca Benini,et al. An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[100] Sander Stuijk,et al. Minimising buffer requirements of synchronous dataflow graphs with model checking , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[101] Onur Mutlu,et al. Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[102] Asser N. Tantawi,et al. Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.