Resource Allocation for Software Pipelines in Many-core Systems

xxvii Zusammenfassung und Übersicht der Arbeit xxix

[1]  Lothar Thiele,et al.  MAMOT: Memory-Aware Mapping Optimization Tool for MPSoC , 2012, 2012 15th Euromicro Conference on Digital System Design.

[2]  Natalie D. Enright Jerger,et al.  Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[3]  Radu Marculescu,et al.  Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4]  Anand Raghunathan,et al.  Automatic generation of software pipelines for heterogeneous parallel systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Jürgen Teich,et al.  Mapping of applications to MPSoCs , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[7]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  William Landi,et al.  Undecidability of static analysis , 1992, LOPL.

[9]  Jonathan M. Smith,et al.  A survey of process migration mechanisms , 1988, OPSR.

[10]  Lothar Thiele,et al.  Scenario-based design flow for mapping streaming applications onto on-chip many-core systems , 2012, CASES '12.

[11]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[12]  Timothy G. Mattson,et al.  Light-weight communications on Intel's single-chip cloud computer processor , 2011, OPSR.

[13]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[14]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[15]  Coniferous softwood GENERAL TERMS , 2003 .

[16]  Sahin Albayrak,et al.  Mobility-based Runtime Load Balancing in Multi-Agent Systems , 2006, SEKE.

[17]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.

[18]  Todor Stefanov,et al.  Modeling adaptive streaming applications with Parameterized Polyhedral Process Networks , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[20]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[21]  Andrea Acquaviva,et al.  Assessing Task Migration Impact on Embedded Soft Real-Time Streaming Multimedia Applications , 2008, EURASIP J. Embed. Syst..

[22]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[23]  Narayanan Vijaykrishnan,et al.  Accelerating neuromorphic vision algorithms for recognition , 2012, DAC Design Automation Conference 2012.

[24]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[25]  Amnon Barak,et al.  Scalable Cluster Computing with MOSIX for LINUX , 1999 .

[26]  Christian Müller-Schloer,et al.  Organic computing: on the feasibility of controlled emergence , 2004, CODES+ISSS '04.

[27]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Narayanan Vijaykrishnan,et al.  Reliability concerns in embedded system designs , 2006, Computer.

[29]  Karam S. Chatha,et al.  A lightweight run-time scheduler for multitasking multicore stream applications , 2010, 2010 IEEE International Conference on Computer Design.

[30]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[31]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[32]  Mahmut T. Kandemir,et al.  Cooperative parallelization , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[33]  Marco D. Santambrogio,et al.  The Autonomic Operating System research project - Achievements and future directions , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[35]  David Z. Pan,et al.  A3MAP: architecture-aware analytic mapping for networks-on-chip , 2010, ASP-DAC 2010.

[36]  Vincent David,et al.  A low-overhead dedicated execution support for stream applications on shared-memory cmp , 2012, EMSOFT '12.

[37]  Swaroop Sridhar,et al.  An approach to heterogeneous process state capture/recovery to achieve minimum performance overhead during normal execution , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[38]  Frédéric Pétrot,et al.  Cost-efficient buffer sizing in shared-memory 3D-MPSoCs using wide I/O interfaces , 2012, DAC Design Automation Conference 2012.

[39]  Hui Xu,et al.  Development of low power many-core SoC for multimedia applications , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[40]  Brian T. Lewis,et al.  Thread Scheduling for Multi-Core Platforms , 2007, HotOS.

[41]  Petr Jan Horn,et al.  Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .

[42]  Santanu Chattopadhyay,et al.  Application Mapping onto Mesh Structured Network-on-Chip Using Particle Swarm Optimization , 2011, 2011 IEEE Computer Society Annual Symposium on VLSI.

[43]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[44]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[45]  Jörg Henkel,et al.  Pipelets: Self-organizing software Pipelines for many-core architectures , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[46]  Li Shang,et al.  Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[47]  Hsien-Hsin S. Lee,et al.  Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.

[48]  Thomas Serre,et al.  Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex , 2004 .

[49]  Karam S. Chatha,et al.  Unrolling and retiming of stream applications onto embedded multicore processors , 2012, DAC Design Automation Conference 2012.

[50]  Kevin Klues,et al.  Processes and Resource Management in a Scalable Many-core OS ∗ , 2010 .

[51]  Victor Pankratius,et al.  AutoTunium: An Evolutionary Tuner for General-Purpose Multicore Applications , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[52]  Saurabh Dighe,et al.  The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[53]  Raphael A. Finkel,et al.  Designing a process migration facility: the Charlotte experience , 1989, Computer.

[54]  Fernando Gehm Moraes,et al.  Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).

[55]  Jörg Henkel,et al.  Work in Progress: Malleable Software Pipelines for Efficient Many-core System Utilization , 2012, MARC Symposium.

[56]  Julie A. McCann,et al.  A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.

[57]  Soonhoi Ha,et al.  Executing synchronous dataflow graphs on a SPM-based multicore architecture , 2012, DAC Design Automation Conference 2012.

[58]  Michael Engel,et al.  Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms , 2012, CODES+ISSS '12.

[59]  Todor Stefanov,et al.  Managing latency in embedded streaming applications under hard-real-time scheduling , 2012, CODES+ISSS '12.

[60]  Jungwon Kim,et al.  An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[61]  Michael Hitchens,et al.  A new process migration algorithm , 1997, OPSR.

[62]  Ioana Burcea,et al.  A compiler and runtime for heterogeneous computing , 2012, DAC Design Automation Conference 2012.

[63]  Mahmut T. Kandemir,et al.  Compiler-directed application mapping for NoC based chip multiprocessors , 2007, LCTES '07.

[64]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[65]  Michael J. Donahoo,et al.  TCP / IP sockets in C# - practical guide for programmers , 2004, The Morgan Kaufmann practical guides series.

[66]  Susan Horwitz,et al.  Precise flow-insensitive may-alias analysis is NP-hard , 1997, TOPL.

[67]  Sri Parameswaran,et al.  Fine-grained hardware/software methodology for process migration in MPSoCs , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[68]  Davide Bertozzi,et al.  Supporting Task Migration in Multi-Processor Systems-on-Chip: A Feasibility Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[69]  Jörg Henkel,et al.  MOMA: Mapping of memory-intensive software-pipelined applications for systems with multiple memory controllers , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[70]  Eduard Ayguadé,et al.  Hardware–Software Coherence Protocol for the Coexistence of Caches and Local Memories , 2012, IEEE Transactions on Computers.

[71]  Nikil D. Dutt,et al.  HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories , 2012, DAC Design Automation Conference 2012.

[72]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[73]  Hartmut Schmeck,et al.  Organic Computing - A New Vision for Distributed Embedded Systems , 2005, ISORC.

[74]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[75]  Glenn Leary,et al.  System-level synthesis of memory architecture for stream processing sub-systems of a MPSoC , 2012, DAC Design Automation Conference 2012.

[76]  Jörg Henkel,et al.  CARAT: Context-aware runtime adaptive task migration for multi core architectures , 2011, 2011 Design, Automation & Test in Europe.

[77]  Rainer Leupers,et al.  Communication-aware mapping of KPN applications onto heterogeneous MPSoCs , 2012, DAC Design Automation Conference 2012.

[78]  Jörg Henkel,et al.  Optimizations for configuring and mapping software pipelines in many core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[79]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[80]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[81]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[82]  George Kurian,et al.  Self-aware computing in the Angstrom processor , 2012, DAC Design Automation Conference 2012.

[83]  Manuel E. Acacio,et al.  Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[84]  Alexey L. Lastovetsky,et al.  Dynamic Load Balancing of Parallel Computational Iterative Routines on Platforms with Memory Heterogeneity , 2010, Euro-Par Workshops.

[85]  Mor Harchol-Balter,et al.  ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[86]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[87]  Chita R. Das,et al.  A heterogeneous multiple network-on-chip design: An application-aware approach , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[88]  Rudy Lauwereins,et al.  Infrastructure for design and management of relocatable tasks in a heterogeneous reconfigurable system-on-chip , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[89]  Théodore Marescaux,et al.  Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles , 2005, Design, Automation and Test in Europe.

[90]  Luís Nogueira,et al.  Server-based scheduling of parallel real-time tasks , 2012, EMSOFT '12.

[91]  Christoph W. Kessler,et al.  Investigation of main memory bandwidth on Intel Single-Chip Cloud Computer , 2011, MARC Symposium.

[92]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[93]  John P. Lehoczky,et al.  Partitioned Fixed-Priority Preemptive Scheduling for Multi-core Processors , 2009, 2009 21st Euromicro Conference on Real-Time Systems.

[94]  Rainer Leupers,et al.  MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[95]  Karam S. Chatha,et al.  Dynamic scheduling of stream programs on embedded multi-core processors , 2012, CODES+ISSS '12.

[96]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[97]  Carl D. Offner,et al.  TStreams : A Model of Parallel Computation ( Preliminary Report ) , .

[98]  Wolfgang Schröder-Preikschat,et al.  DistRM: Distributed resource management for on-chip many-core systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[99]  Luca Benini,et al.  An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[100]  Sander Stuijk,et al.  Minimising buffer requirements of synchronous dataflow graphs with model checking , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[101]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[102]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.