On Efficient Data Exchange in Multicore Architectures
暂无分享,去创建一个
[1] David C. Cann,et al. A Report on the Sisal Language Project , 1990, J. Parallel Distributed Comput..
[2] Pierre-Louis Curien,et al. Sequential Algorithms on Concrete Data Structures , 1982, Theor. Comput. Sci..
[3] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[4] Edward A. Lee,et al. Dataflow process networks , 2001 .
[5] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[6] Ghislain Roquier,et al. Scheduling of dynamic dataflow programs based on state space analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[8] Lothar Thiele,et al. Scenario-based design flow for mapping streaming applications onto on-chip many-core systems , 2012, CASES '12.
[9] Edward G. Coffman,et al. A Study of Interleaved Memory Systems , 1899 .
[10] Albert Benveniste,et al. Compositionality in Dataflow Synchronous Languages: Specification and Code Generation , 1997, COMPOS.
[11] Andreas Olofsson,et al. A 1024-core 70 GFLOP/W Floating Point Manycore Microprocessor , 2011 .
[12] Heechul Yun,et al. MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore Based Embedded Systems , 2015, 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications.
[13] B. Ramakrishna Rau,et al. Interleaved Memory Bandwidth in a Model of a Multiprocessor Computer System , 1979, IEEE Transactions on Computers.
[14] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[15] Kevin Fu,et al. Mementos: system support for long-running computation on RFID-scale devices , 2011, ASPLOS XVI.
[16] Thomas Nolte,et al. Contention-Free Execution of Automotive Applications on a Clustered Many-Core Platform , 2016, 2016 28th Euromicro Conference on Real-Time Systems (ECRTS).
[17] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[18] R. Karp,et al. Properties of a model for parallel computations: determinacy , 1966 .
[19] Alan Jay Smith,et al. Interference in multiprocessor computer systems with interleaved memory , 1976, CACM.
[20] Rodolfo Pellizzoni,et al. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).
[21] Viera Sipková,et al. Efficient Variable Allocation to Dual Memory Banks of DSPs , 2003, SCOPES.
[22] Mickaël Raulet,et al. Classification of Dataflow Actors with Satisfiability and Abstract Interpretation , 2012, Int. J. Embed. Real Time Commun. Syst..
[23] Mickaël Raulet,et al. Orcc: multimedia development made easy , 2013, MM '13.
[24] Sébastien Lafond,et al. Quasi-Static Scheduling of CAL Actor Networks for Reconfigurable Video Coding , 2011, J. Signal Process. Syst..
[25] Edward A. Lee,et al. PRET DRAM controller: Bank privatization for predictability and temporal isolation , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[26] Björn Franke,et al. Fast source-level data assignment to dual memory banks , 2008, SCOPES '08.
[27] R. Govindarajan,et al. An Array Allocation Scheme for Energy Reduction in Partitioned Memory Architectures , 2007, CC.
[28] Gilles Kahn,et al. The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.
[29] Luca Benini,et al. Brain-Inspired Classroom Occupancy Monitoring on a Low-Power Mobile Platform , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[30] Paul Chow,et al. Exploiting dual data-memory banks in digital signal processors , 1996, ASPLOS VII.
[31] André Rossi,et al. Memory Allocation Problems in Embedded Systems: Optimization Methods , 2012 .
[32] Benoît Dupont de Dinechin,et al. Time-critical computing on a single-chip massively parallel processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[33] Twan Basten,et al. Efficient Execution of Process Networks , 2001 .
[34] S. K. Nandy,et al. A complexity effective communication model for behavioral modeling of signal processing applications , 2003, DAC '03.
[35] Luca Benini,et al. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA , 2017, ArXiv.
[36] Rodolfo Pellizzoni,et al. Worst Case Analysis of DRAM Latency in Multi-requestor Systems , 2013, 2013 IEEE 34th Real-Time Systems Symposium.
[37] Xiaobing Feng,et al. Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors , 2010, NPC.
[38] Luca Benini,et al. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[39] Rainer Leupers,et al. MPSoC programming using the MAPS compiler , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).
[40] Pascal Sainrat,et al. Temporal Isolation of Hard Real-Time Applications on Many-Core Processors , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).
[41] K. N. Dollman,et al. - 1 , 1743 .
[42] Gregory J. Chaitin,et al. Register allocation & spilling via graph coloring , 1982, SIGPLAN '82.
[43] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.
[44] Michele Magno,et al. Dynamic energy burst scaling for transiently powered systems , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[45] Soonhoi Ha,et al. Extended Synchronous Dataflow for Efficient DSP System Prototyping , 2002, Des. Autom. Embed. Syst..
[46] Lei Liu,et al. A software memory partition approach for eliminating bank-level interference in multicore systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[47] Rainer Leupers,et al. An optimal allocation of memory buffers for complex multicore platforms , 2016, J. Syst. Archit..
[48] Dileep Bhandarkar,et al. Analysis of Memory Interference in Multiprocessors , 1975, IEEE Transactions on Computers.
[49] Luca Mottola,et al. Efficient State Retention for Transiently-powered Embedded Sensing , 2016, EWSN.
[50] Nikolaj Bjørner,et al. Z3: An Efficient SMT Solver , 2008, TACAS.
[51] L. Dries,et al. University of California at Berkeley Berkeley, CA, USA March 24–27, 2011 , 2012 .
[52] Rezaur Rahman,et al. Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .
[53] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .
[54] Rainer Leupers,et al. Variable partitioning for dual memory bank DSPs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[55] Dam Sunwoo,et al. Balancing DRAM locality and parallelism in shared memory CMP systems , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[56] Yunheung Paek,et al. Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms , 2002, LCTES/SCOPES '02.
[57] Xing Pan,et al. TintMalloc: Reducing Memory Access Divergence via Controller-Aware Coloring , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[58] Edward G. Coffman,et al. A Combinatorial Problem Related to Interleaved Memory Systems , 1973, JACM.
[59] Robert I. Davis,et al. Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor , 2016, RTNS '16.
[60] Lothar Thiele,et al. Windowed FIFOs for FPGA-based Multiprocessor Systems , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).
[61] Charles E. Skinner,et al. Effects of Storage Contention on System Performance , 1969, IBM Syst. J..
[62] William Daniel Strecker. An analysis of the instruction execution rate in certain computer structures , 1970 .
[63] Robert de Simone,et al. Static Mapping of Real-Time Applications onto Massively Parallel Processor Arrays , 2014, 2014 14th International Conference on Application of Concurrency to System Design.
[64] Kees G. W. Goossens. A protocol and memory manager for on-chip communication , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).
[65] Samuel H. Fuller,et al. Markov chain models for analyzing memory interference in multiprocessor computer systems , 1973, ISCA '73.
[66] Samuel Kotz,et al. Urn Models and Their Application: An Approach to Modern Discrete Probability Theory , 1978 .
[67] Luca Benini,et al. PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision , 2015, Journal of Signal Processing Systems.
[68] Joseph R. Cavallaro,et al. Low power implementation of digital predistortion filter on a heterogeneous application specific multiprocessor , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[69] Patrick Meumeu Yomsi,et al. The Variability of Application Execution Times on a Multi-Core Platform , 2016, WCET.
[70] Rainer Leupers,et al. Throughput driven transformations of Synchronous Data Flows for mapping to heterogeneous MPSoCs , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).
[71] Shuvra S. Bhattacharyya,et al. Partitioning for DSP Software Synthesis , 2003, SCOPES.
[72] Shuvra S. Bhattacharyya,et al. A generalized scheduling approach for dynamic dataflow applications , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[73] Eric Cheung,et al. Automatic buffer sizing for rate-constrained KPN applications on multiprocessor system-on-chip , 2007, 2007 IEEE International High Level Design Validation and Test Workshop.
[74] Christian Haubelt,et al. Classification of General Data Flow Actors into Known Models of Computation , 2008, 2008 6th ACM/IEEE International Conference on Formal Methods and Models for Co-Design.
[75] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[76] Mickaël Raulet,et al. Automatic Hierarchical Discovery of Quasi-Static Schedules of RVC-CAL Dataflow Programs , 2013, J. Signal Process. Syst..
[77] Soonhoi Ha,et al. Fractional Rate Dataflow Model for Efficient Code Synthesis , 2004, J. VLSI Signal Process..
[78] Mickaël Raulet,et al. The Reconfigurable Video Coding Standard [Standards in a Nutshell] , 2010, IEEE Signal Processing Magazine.
[79] Kazuki Sakamoto,et al. Grand Central Dispatch , 2012 .
[80] Soonhoi Ha,et al. Data memory minimization by sharing large size buffers , 2000, ASP-DAC.
[81] Lothar Thiele,et al. Mapping mixed-criticality applications on multi-core architectures , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[82] Nacer-Eddine Zergainoh,et al. Buffer Size Reduction through Control-Flow Decomposition , 2007, 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007).
[83] Meikang Qiu,et al. Variable assignment and instruction scheduling for processor with multi-module memory , 2011, Microprocess. Microsystems.
[84] Lothar Thiele,et al. Mapping Applications to Tiled Multiprocessor Embedded Systems , 2007, Seventh International Conference on Application of Concurrency to System Design (ACSD 2007).
[85] Yu Wang,et al. 4.7 A 65nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).
[86] Taewhan Kim,et al. Integration of Code Scheduling, Memory Allocation, and Array Binding for Memory-Access Optimization , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[87] Frank Mueller,et al. Reducing NoC and Memory Contention for Manycores , 2016, ARCS.
[88] Todor Stefanov,et al. pn: A Tool for Improved Derivation of Process Networks , 2007, EURASIP J. Embed. Syst..
[89] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[90] Jean A. Peperstraete,et al. Cycle-static dataflow , 1996, IEEE Trans. Signal Process..
[91] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[92] Thomas Martyn Parks,et al. Bounded scheduling of process networks , 1996 .
[93] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[94] C. J. Date. A Guide to the SQL Standard , 1987 .
[95] Reinhold Heckmann,et al. Worst case execution time prediction by static program analysis , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[96] Edward A. Lee,et al. Scheduling dynamic dataflow graphs with bounded memory using the token flow model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[97] Sven-Bodo Scholz. Single Assignment C - Functional Programming Using Imperative Style , 1994 .
[98] E.A. Lee,et al. Synchronous data flow , 1987, Proceedings of the IEEE.
[99] Aviral Shrivastava,et al. Operation and data mapping for CGRAs with multi-bank memory , 2010, LCTES '10.
[100] Edward A. Lee. Consistency in dataflow graphs , 1991, Proceedings of the International Conference on Application Specific Array Processors.