Automated software synthesis for streaming applications on embedded manycore processors
暂无分享,去创建一个
[1] William Thies,et al. Phased scheduling of stream programs , 2003, LCTES '03.
[2] Giovanni De Micheli,et al. Synthesis and Optimization of Digital Circuits , 1994 .
[3] Rajeev Barua,et al. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size , 2005, CASES '05.
[4] Edward A. Lee,et al. Dataflow process networks , 1995, Proc. IEEE.
[5] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.
[6] Edward A. Lee,et al. A causality interface for deadlock analysis in dataflow , 2006, EMSOFT '06.
[7] E.A. Lee,et al. Synchronous data flow , 1987, Proceedings of the IEEE.
[8] Vivek Sarkar,et al. Determining average program execution times and their variance , 1989, PLDI '89.
[9] Massoud Pedram,et al. Architectures for silicon nanoelectronics and beyond , 2007, Computer.
[10] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[11] Jaejin Lee,et al. FaCSim: a fast and cycle-accurate architecture simulator for embedded systems , 2008, LCTES '08.
[12] Shuvra S. Bhattacharyya,et al. Functional DIF for Rapid Prototyping , 2008, 2008 The 19th IEEE/IFIP International Symposium on Rapid System Prototyping.
[13] Sander Stuijk,et al. Liveness and Boundedness of Synchronous Data Flow Graphs , 2006, 2006 Formal Methods in Computer Aided Design.
[14] Edward A. Lee. The problem with threads , 2006, Computer.
[15] Emery D. Berger,et al. Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA '09.
[16] Srivaths Ravi,et al. Energy-optimizing source code transformations for operating system-driven embedded software , 2007, TECS.
[17] Soheil Ghiasi,et al. Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures , 2009, TECS.
[18] Edward A. Lee,et al. Synthesis of Embedded Software from Synchronous Dataflow Specifications , 1999, J. VLSI Signal Process..
[19] Sander Stuijk,et al. Throughput Analysis of Synchronous Data Flow Graphs , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).
[20] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[21] Guang R. Gao,et al. Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.
[22] William Thies,et al. Teleport messaging for distributed stream programs , 2005, PPoPP.
[23] Krzysztof Kuchcinski,et al. Partial task assignment of task graphs under heterogeneous resource constraints , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).
[24] Praveen K. Murthy,et al. Buffer merging—a powerful technique for reducing memory requirements of synchronous dataflow specifications , 2004, TODE.
[25] Tinoosh Mohsenin,et al. Algorithms and architectures for efficient low density parity check (ldpc) decoder hardware , 2010 .
[26] Soheil Ghiasi,et al. Exact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis , 2008, 2008 Design, Automation and Test in Europe.
[27] E. A. de Kock. Multiprocessor mapping of process networks: a JPEG decoding case study , 2002 .
[28] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[29] Praveen K. Murthy,et al. Beyond single-appearance schedules: Efficient DSP software synthesis using nested procedure calls , 2007, TECS.
[30] Twan Basten,et al. Simultaneous budget and buffer size computation for throughput-constrained task graphs , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[31] Coniferous softwood. GENERAL TERMS , 2003 .
[32] Edward A. Lee,et al. Software Synthesis from Dataflow Graphs , 1996 .
[33] Walid Taha,et al. A Gentle Introduction to Multi-stage Programming , 2003, Domain-Specific Program Generation.
[34] Bevan M. Baas,et al. A high-performance parallel CAVLC encoder on a fine-grained many-core system , 2008, 2008 IEEE International Conference on Computer Design.
[35] Sander Stuijk,et al. Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs , 2008, IEEE Transactions on Computers.
[36] Alan Gray,et al. Deterministic Parallel Processing , 2006, International Journal of Parallel Programming.
[37] Yao Zhang,et al. Parallel Computing Experiences with CUDA , 2008, IEEE Micro.
[38] Stefan Rusu,et al. A 45nm 8-core enterprise Xeon ® processor , 2009 .
[39] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[40] Scott A. Mahlke,et al. MacroSS: macro-SIMDization of streaming applications , 2010, ASPLOS XV.
[41] Anant Agarwal,et al. The KILL Rule for Multicore , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[42] Soheil Ghiasi,et al. Look into details: the benefits of fine-grain streaming buffer analysis , 2010, LCTES '10.
[43] Marc Pouzet,et al. Towards a higher-order synchronous data-flow language , 2004, EMSOFT '04.
[44] Jürgen Teich,et al. Multidimensional Exploration of Software Implementations for DSP Algorithms , 2000, J. VLSI Signal Process..
[45] Radu Marculescu,et al. Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[46] Koushik Sen,et al. A randomized dynamic program analysis technique for detecting real deadlocks , 2009, PLDI '09.
[47] Lokesh Sharma,et al. A 32nm Westmere-EX Xeon® enterprise processor , 2011, 2011 IEEE International Solid-State Circuits Conference.
[48] Soheil Ghiasi,et al. Versatile Task Assignment for Heterogeneous Soft Dual-Processor Platforms , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[49] Soheil Ghiasi,et al. Joint throughput and energy optimization for pipelined execution of embedded streaming applications , 2007, LCTES '07.
[50] Bart Kienhuis,et al. Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network processor , 2007, SCOPES '07.
[51] Lothar Thiele,et al. Performance analysis of distributed embedded systems , 2007, EMSOFT '07.
[52] Jakob Engblom,et al. The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.
[53] George Karypis,et al. Architecture Aware Partitioning Algorithms , 2008, ICA3PP.
[54] Edward A. Lee. Building Unreliable Systems out of Reliable Components : The Real Time Story , 2005 .
[55] Luca Benini,et al. Dynamic frequency scaling with buffer insertion for mixed workloads , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[56] Eorge,et al. Unstructured Graph Partitioning and Sparse Matrix Ordering System Version 2 . 0 , 1995 .
[57] Thomas A. Henzinger,et al. The Embedded Systems Design Challenge , 2006, FM.
[58] Edward A. Lee,et al. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.
[59] Zhiyi Yu,et al. A 167-Processor Computational Platform in 65 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.
[60] Alberto L. Sangiovanni-Vincentelli,et al. Benefits and challenges for platform-based design , 2004, Proceedings. 41st Design Automation Conference, 2004..
[61] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[62] William Thies,et al. An empirical characterization of stream programs and its implications for language and compiler design , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[63] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[64] Twan Basten,et al. Reactive process networks , 2004, EMSOFT '04.
[65] T. Mohsenin,et al. A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling , 2008, 2008 IEEE Symposium on VLSI Circuits.
[66] Soheil Ghiasi,et al. System-Level Performance Estimation for Application-Specific MPSoC Interconnect Synthesis , 2008, 2008 Symposium on Application Specific Processors.
[67] Yu Wang,et al. An efficient technique for analysis of minimal buffer requirements of synchronous dataflow graphs with model checking , 2009, CODES+ISSS '09.
[68] Rudy Lauwereins,et al. Data memory minimisation for synchronous data flow graphs emulated on DSP-FPGA targets , 1997, DAC.
[69] R. Passerone,et al. System level design paradigms: Platform-based design and communication synthesis , 2004 .
[70] Scott A. Mahlke,et al. Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.
[71] Alvise Bonivento,et al. System level design paradigms: Platform-based design and communication synthesis , 2006, ACM Trans. Design Autom. Electr. Syst..
[72] James R. Larus,et al. Software and the Concurrency Revolution , 2005, ACM Queue.
[73] Luciano Lavagno,et al. Metropolis: An Integrated Electronic System Design Environment , 2003, Computer.
[74] Scott A. Mahlke,et al. Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[75] Soonhoi Ha,et al. Dynamic voltage scheduling with buffers in low-power multimedia applications , 2004, TECS.
[76] Jason Cong,et al. Synthesis of an application-specific soft multiprocessor system , 2007, FPGA '07.
[77] Michael I. Gordon. Compiler techniques for scalable performance of stream programs on multicore architectures , 2010 .
[78] Rajeev Barua,et al. Dynamic allocation for scratch-pad memory using compile-time decisions , 2006, TECS.
[79] N. Ranganathan,et al. A learning automata based framework for task assignment in heterogeneous computing systems , 1999, SAC '99.
[80] Tinoosh Mohsenin,et al. Multi-Split-Row Threshold decoding implementations for LDPC codes , 2009, 2009 IEEE International Symposium on Circuits and Systems.
[81] Shuvra S. Bhattacharyya,et al. Parameterized dataflow modeling of DSP systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[82] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[83] Edward A. Lee,et al. Software synthesis for DSP using ptolemy , 1995, J. VLSI Signal Process..
[84] T. Mohsenin,et al. An asynchronous array of simple processors for dsp applications , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.
[85] Bevan M. Baas,et al. Massively parallel processor array for mid-/back-end ultrasound signal processing , 2010, 2010 Biomedical Circuits and Systems Conference (BioCAS).
[86] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[87] Sander Stuijk,et al. Multiprocessor Resource Allocation for Throughput-Constrained Synchronous Dataflow Graphs , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[88] Andy D. Pimentel,et al. Multiobjective optimization and evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design , 2006, IEEE Transactions on Evolutionary Computation.
[89] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[90] Doug Pulley. Multi-core DSP for base stations: Large and small , 2008, 2008 Asia and South Pacific Design Automation Conference.
[91] William J. Dally,et al. Tradeoff between data-, instruction-, and thread-level parallelism in stream processors , 2007, ICS '07.
[92] William J. Dally,et al. Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures , 2010, SPAA '10.