Resource Allocation for Software Pipelines in Many-core Systems
暂无分享,去创建一个
[1] Lothar Thiele,et al. MAMOT: Memory-Aware Mapping Optimization Tool for MPSoC , 2012, 2012 15th Euromicro Conference on Digital System Design.
[2] Natalie D. Enright Jerger,et al. Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.
[3] Radu Marculescu,et al. Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[4] Anand Raghunathan,et al. Automatic generation of software pipelines for heterogeneous parallel systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Jürgen Teich,et al. Mapping of applications to MPSoCs , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[6] M TullsenDean,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .
[7] Amit Kumar Singh,et al. Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[8] William Landi,et al. Undecidability of static analysis , 1992, LOPL.
[9] Jonathan M. Smith,et al. A survey of process migration mechanisms , 1988, OPSR.
[10] Lothar Thiele,et al. Scenario-based design flow for mapping streaming applications onto on-chip many-core systems , 2012, CASES '12.
[11] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[12] Timothy G. Mattson,et al. Light-weight communications on Intel's single-chip cloud computer processor , 2011, OPSR.
[13] Dean M. Tullsen,et al. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.
[14] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[15] Coniferous softwood. GENERAL TERMS , 2003 .
[16] Sahin Albayrak,et al. Mobility-based Runtime Load Balancing in Multi-Agent Systems , 2006, SEKE.
[17] Teofilo F. Gonzalez,et al. P-Complete Approximation Problems , 1976, J. ACM.
[18] Todor Stefanov,et al. Modeling adaptive streaming applications with Parameterized Polyhedral Process Networks , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[19] W. Dally,et al. Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[20] William Thies,et al. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[21] Andrea Acquaviva,et al. Assessing Task Migration Impact on Embedded Soft Real-Time Streaming Multimedia Applications , 2008, EURASIP J. Embed. Syst..
[22] Norman P. Jouppi,et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.
[23] Narayanan Vijaykrishnan,et al. Accelerating neuromorphic vision algorithms for recognition , 2012, DAC Design Automation Conference 2012.
[24] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[25] Amnon Barak,et al. Scalable Cluster Computing with MOSIX for LINUX , 1999 .
[26] Christian Müller-Schloer,et al. Organic computing: on the feasibility of controlled emergence , 2004, CODES+ISSS '04.
[27] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Narayanan Vijaykrishnan,et al. Reliability concerns in embedded system designs , 2006, Computer.
[29] Karam S. Chatha,et al. A lightweight run-time scheduler for multitasking multicore stream applications , 2010, 2010 IEEE International Conference on Computer Design.
[30] Shekhar Y. Borkar,et al. Design challenges of technology scaling , 1999, IEEE Micro.
[31] Gilles Kahn,et al. The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.
[32] Mahmut T. Kandemir,et al. Cooperative parallelization , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[33] Marco D. Santambrogio,et al. The Autonomic Operating System research project - Achievements and future directions , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[34] Shekhar Y. Borkar,et al. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.
[35] David Z. Pan,et al. A3MAP: architecture-aware analytic mapping for networks-on-chip , 2010, ASP-DAC 2010.
[36] Vincent David,et al. A low-overhead dedicated execution support for stream applications on shared-memory cmp , 2012, EMSOFT '12.
[37] Swaroop Sridhar,et al. An approach to heterogeneous process state capture/recovery to achieve minimum performance overhead during normal execution , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[38] Frédéric Pétrot,et al. Cost-efficient buffer sizing in shared-memory 3D-MPSoCs using wide I/O interfaces , 2012, DAC Design Automation Conference 2012.
[39] Hui Xu,et al. Development of low power many-core SoC for multimedia applications , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[40] Brian T. Lewis,et al. Thread Scheduling for Multi-Core Platforms , 2007, HotOS.
[41] Petr Jan Horn,et al. Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .
[42] Santanu Chattopadhyay,et al. Application Mapping onto Mesh Structured Network-on-Chip Using Particle Swarm Optimization , 2011, 2011 IEEE Computer Society Annual Symposium on VLSI.
[43] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[44] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[45] Jörg Henkel,et al. Pipelets: Self-organizing software Pipelines for many-core architectures , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[46] Li Shang,et al. Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.
[47] Hsien-Hsin S. Lee,et al. Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.
[48] Thomas Serre,et al. Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex , 2004 .
[49] Karam S. Chatha,et al. Unrolling and retiming of stream applications onto embedded multicore processors , 2012, DAC Design Automation Conference 2012.
[50] Kevin Klues,et al. Processes and Resource Management in a Scalable Many-core OS ∗ , 2010 .
[51] Victor Pankratius,et al. AutoTunium: An Evolutionary Tuner for General-Purpose Multicore Applications , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[52] Saurabh Dighe,et al. The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[53] Raphael A. Finkel,et al. Designing a process migration facility: the Charlotte experience , 1989, Computer.
[54] Fernando Gehm Moraes,et al. Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).
[55] Jörg Henkel,et al. Work in Progress: Malleable Software Pipelines for Efficient Many-core System Utilization , 2012, MARC Symposium.
[56] Julie A. McCann,et al. A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.
[57] Soonhoi Ha,et al. Executing synchronous dataflow graphs on a SPM-based multicore architecture , 2012, DAC Design Automation Conference 2012.
[58] Michael Engel,et al. Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms , 2012, CODES+ISSS '12.
[59] Todor Stefanov,et al. Managing latency in embedded streaming applications under hard-real-time scheduling , 2012, CODES+ISSS '12.
[60] Jungwon Kim,et al. An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[61] Michael Hitchens,et al. A new process migration algorithm , 1997, OPSR.
[62] Ioana Burcea,et al. A compiler and runtime for heterogeneous computing , 2012, DAC Design Automation Conference 2012.
[63] Mahmut T. Kandemir,et al. Compiler-directed application mapping for NoC based chip multiprocessors , 2007, LCTES '07.
[64] Jeffrey O. Kephart,et al. The Vision of Autonomic Computing , 2003, Computer.
[65] Michael J. Donahoo,et al. TCP / IP sockets in C# - practical guide for programmers , 2004, The Morgan Kaufmann practical guides series.
[66] Susan Horwitz,et al. Precise flow-insensitive may-alias analysis is NP-hard , 1997, TOPL.
[67] Sri Parameswaran,et al. Fine-grained hardware/software methodology for process migration in MPSoCs , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[68] Davide Bertozzi,et al. Supporting Task Migration in Multi-Processor Systems-on-Chip: A Feasibility Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[69] Jörg Henkel,et al. MOMA: Mapping of memory-intensive software-pipelined applications for systems with multiple memory controllers , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[70] Eduard Ayguadé,et al. Hardware–Software Coherence Protocol for the Coexistence of Caches and Local Memories , 2012, IEEE Transactions on Computers.
[71] Nikil D. Dutt,et al. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories , 2012, DAC Design Automation Conference 2012.
[72] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[73] Hartmut Schmeck,et al. Organic Computing - A New Vision for Distributed Embedded Systems , 2005, ISORC.
[74] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .
[75] Glenn Leary,et al. System-level synthesis of memory architecture for stream processing sub-systems of a MPSoC , 2012, DAC Design Automation Conference 2012.
[76] Jörg Henkel,et al. CARAT: Context-aware runtime adaptive task migration for multi core architectures , 2011, 2011 Design, Automation & Test in Europe.
[77] Rainer Leupers,et al. Communication-aware mapping of KPN applications onto heterogeneous MPSoCs , 2012, DAC Design Automation Conference 2012.
[78] Jörg Henkel,et al. Optimizations for configuring and mapping software pipelines in many core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[79] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[80] Edward A. Lee,et al. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.
[81] Tong Li,et al. Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[82] George Kurian,et al. Self-aware computing in the Angstrom processor , 2012, DAC Design Automation Conference 2012.
[83] Manuel E. Acacio,et al. Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.
[84] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Platforms with Memory Heterogeneity , 2010, Euro-Par Workshops.
[85] Mor Harchol-Balter,et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[86] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[87] Chita R. Das,et al. A heterogeneous multiple network-on-chip design: An application-aware approach , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[88] Rudy Lauwereins,et al. Infrastructure for design and management of relocatable tasks in a heterogeneous reconfigurable system-on-chip , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.
[89] Théodore Marescaux,et al. Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles , 2005, Design, Automation and Test in Europe.
[90] Luís Nogueira,et al. Server-based scheduling of parallel real-time tasks , 2012, EMSOFT '12.
[91] Christoph W. Kessler,et al. Investigation of main memory bandwidth on Intel Single-Chip Cloud Computer , 2011, MARC Symposium.
[92] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[93] John P. Lehoczky,et al. Partitioned Fixed-Priority Preemptive Scheduling for Multi-core Processors , 2009, 2009 21st Euromicro Conference on Real-Time Systems.
[94] Rainer Leupers,et al. MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[95] Karam S. Chatha,et al. Dynamic scheduling of stream programs on embedded multi-core processors , 2012, CODES+ISSS '12.
[96] Natalie D. Enright Jerger,et al. Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[97] Carl D. Offner,et al. TStreams : A Model of Parallel Computation ( Preliminary Report ) , .
[98] Wolfgang Schröder-Preikschat,et al. DistRM: Distributed resource management for on-chip many-core systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[99] Luca Benini,et al. An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[100] Sander Stuijk,et al. Minimising buffer requirements of synchronous dataflow graphs with model checking , 2005, Proceedings. 42nd Design Automation Conference, 2005..
[101] Onur Mutlu,et al. Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.
[102] Asser N. Tantawi,et al. Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.