Hardware design of task superscalar architecture
暂无分享,去创建一个
[1] Walid A. Najjar,et al. A quantitative analysis of locality in dataflow programs , 1991, MICRO 24.
[2] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[3] David E. Culler,et al. The Explicit Token Store , 1990, J. Parallel Distributed Comput..
[4] Hiroshi Yasuhara,et al. DDDP-a Distributed Data Driven Processor , 1983, ISCA '83.
[5] Karthikeyan Sankaralingam,et al. Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[6] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[7] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[8] Ali R. Hurson,et al. Dataflow architectures and multithreading , 1994, Computer.
[9] Andrei Sergeevich Terechko,et al. A Multithreaded Multicore System for Embedded Media Processing , 2011, Trans. High Perform. Embed. Archit. Compil..
[10] Monica S. Lam,et al. Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.
[11] D. Marr,et al. Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .
[12] Mauricio J. Serrano,et al. Performance estimation of multistreamed, superscalar processors , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[13] Alejandro Duran,et al. Productive Cluster Programming with OmpSs , 2011, Euro-Par.
[14] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[15] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.
[16] Rainer Leupers,et al. Task management in MPSoCs: An ASIP approach , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.
[17] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[18] Jesús Labarta,et al. ClusterSs: a task-based programming model for clusters , 2011, HPDC '11.
[19] J B Dennis. The varieties of data flow computers , 1986 .
[20] Guang R. Gao,et al. Measurement and modeling of EARTH-MANNA multithreaded architecture , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[21] Keshav Pingali,et al. I-structures: Data structures for parallel computing , 1986, Graph Reduction.
[22] Brian Demsky,et al. OoOJava: an out-of-order approach to parallel programming , 2010 .
[23] Walid A. Najjar,et al. An evaluation of coarse grain dataflow code generation strategies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.
[24] T. Sherwood,et al. Predictor-directed stream buffers , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[25] Toshitsugu Yuba,et al. The SIGMA-1 dataflow computer , 1987, FJCC.
[26] Andrei Sergeevich Terechko,et al. A Hardware Task Scheduler for Embedded Video Processing , 2008, HiPEAC.
[27] Juanjo Noguera,et al. System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures , 2003, CASES '03.
[28] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.
[29] Peter K. Pearson,et al. Fast hashing of variable-length text strings , 1990, CACM.
[30] Magnus Själander,et al. A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.
[31] E.A. Lee,et al. Synchronous data flow , 1987, Proceedings of the IEEE.
[32] Theo Ungerer,et al. The ASTOR Architecture , 1987, ICDCS.
[33] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[34] Ian Watson,et al. The Manchester prototype dataflow computer , 1985, CACM.
[35] Mitsuhisa Sato,et al. The EM-X parallel computer: architecture and basic performance , 1995, ISCA.
[36] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[37] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[38] Harry F. Jordan. Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.
[39] Jean-Luc Gaudiot,et al. Data-Flow and Multithreaded Architectures , 1999 .
[40] L. Rauchwerger,et al. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..
[41] David Chaiken,et al. Latency Tolerance through Multithreading in Large-Scale Multiprocessors , 1991 .
[42] Eduard Ayguadé,et al. Overlapping communication and computation by using a hybrid MPI/SMPSs approach , 2010, ICS '10.
[43] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[44] Soo-Ik Chae,et al. A hardware operating system kernel for multi-processor systems , 2008, IEICE Electron. Express.
[45] Joseph E. Requa. The Piecewise Data Flow architecture control flow and register management , 1983, ISCA '83.
[46] Dr. Jurij Šilc,et al. Processor Architecture , 1999, Springer Berlin Heidelberg.
[47] Kenneth R. Traub,et al. Multithreading: a revisionist view of dataflow architectures , 1991, ISCA '91.
[48] Guang R. Gao,et al. Earth: an efficient architecture for running threads , 1999 .
[49] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[50] Robert H. Halstead,et al. Multithreaded Computer Architecture , 1994, The Kluwer International Series in Engineering and Computer Science.
[51] Eduard Ayguadé,et al. Nanos mercurium: A research compiler for OpenMP , 2004 .
[52] Paraskevas Evripidou,et al. Data-Driven Multithreading Using Conventional Microprocessors , 2006, IEEE Transactions on Parallel and Distributed Systems.
[53] John von Neumann,et al. First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.
[54] Paraskevas Evripidou,et al. Data Driven Network of Workstations D2NOW) , 2000, J. Univers. Comput. Sci..
[55] Monica S. Lam,et al. Coarse-grain parallel programming in Jade , 1991, PPOPP '91.
[56] Angelos Bilas,et al. Tagged Procedure Calls (TPC): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor , 2010, HiPEAC.
[57] Seth Copen Goldstein,et al. Tartan: evaluating spatial computation for whole program execution , 2006, ASPLOS XII.
[58] W. Daniel Hillis,et al. The connection machine , 1985 .
[59] Juanjo Noguera,et al. Multitasking on reconfigurable architectures: microarchitecture support and dynamic scheduling , 2004, TECS.
[60] Wooyoung Kim,et al. Multicore Desktop Programming with Intel Threading Building Blocks , 2011, IEEE Software.
[61] Jaehyuk Huh,et al. TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP , 2004, TACO.
[62] David E. Culler,et al. Dataflow architectures , 1986 .
[63] Ben H. H. Juurlink,et al. Nexus: Hardware Support for Task-Based Programming , 2011, 2011 14th Euromicro Conference on Digital System Design.
[64] R. Karp,et al. Properties of a model for parallel computations: determinacy , 1966 .
[65] Steven Swanson,et al. Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.
[66] Roman L. Lysecky,et al. Configuration Locking and Schedulability Estimation for Reduced Reconfiguration Overheads of Reconfigurable Systems , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[67] Ute Schürfeld,et al. The Stollmann Data Flow Machine , 1989, PARLE.
[68] Paraskevas Evripidou. D3-Machine: A decoupled data-driven multithreaded architecture with variable resolution support , 2001, Parallel Comput..
[69] Vason P. Srini,et al. An Architectural Comparison of Dataflow Systems , 1986, Computer.
[70] K. Waldschmidt,et al. ADARC: a fine grain dataflow architecture with associative communication network , 1994, Proceedings of Twentieth Euromicro Conference. System Architecture and Integration.
[71] A. L. Davis,et al. The architecture and system method of DDM1: A recursively structured Data Driven Machine , 1978, ISCA '78.
[72] John R. Ellis,et al. Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific) , 1985 .
[73] Richard P. Hopkins,et al. Combining Data Flow and Control Flow Computing , 1982, Comput. J..
[74] Monica S. Lam,et al. Heterogeneous parallel programming in Jade , 1992, Proceedings Supercomputing '92.
[75] Paraskevas Evripidou,et al. A Decoupled Graph/Computation Data-Driven Architecture with Variable-Resolution Actors , 1990, International Conference on Parallel Processing.
[76] Robert A. Iannucci,et al. A dataflow/von Neumann hybrid architecture , 1988 .
[77] Jack B. Dennis,et al. A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.
[78] Allan Porterfield,et al. The Tera computer system , 1990 .
[79] Ben H. H. Juurlink,et al. A Case for Hardware Task Management Support for the StarSS Programming Model , 2010, 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.
[80] Erich Bloch,et al. The engineering design of the stretch computer , 1959, IRE-AIEE-ACM '59 (Eastern).
[81] Mike Lee,et al. Design and Implementation of the POWER5 TM Microprocessor , 2004 .
[82] Francisco J. Cazorla,et al. Kilo-instruction processors: overcoming the memory wall , 2005, IEEE Micro.
[83] A. Crespo,et al. A hardware scheduler for complex real-time systems , 1999, ISIE '99. Proceedings of the IEEE International Symposium on Industrial Electronics (Cat. No.99TH8465).
[84] Yoav Etsion,et al. FPGA-Based Prototype of the Task Superscalar Architecture , 2013 .
[85] Eduard Ayguadé,et al. Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[86] V. Gerald Grafe,et al. The Epsilon-2 Multiprocessor System , 1990, J. Parallel Distributed Comput..
[87] Guang R. Gao,et al. A design study of the EARTH multiprocessor , 1995, PACT.
[88] Arvind,et al. Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.
[89] Yoav Etsion,et al. Hybrid Dataflow/von-Neumann Architectures , 2014, IEEE Transactions on Parallel and Distributed Systems.
[90] Jesús Labarta,et al. CellSs: Scheduling techniques to better exploit memory hierarchy , 2009, Sci. Program..
[91] Brian Demsky,et al. OoOJava: software out-of-order execution , 2011, PPoPP '11.
[92] Edward A. Lee,et al. Advances in the dataflow computational model , 1999, Parallel Comput..
[93] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.
[94] Monica S. Lam,et al. The design, implementation, and evaluation of Jade , 1998, TOPL.
[95] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[96] Guang R. Gao,et al. Quantitive studies of data-locality sensitivity on the EARTH multithreaded architecture: preliminary results , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).
[97] Guang R. Gao,et al. A Study of the EARTH-MANNA Multithreaded System , 1996, International Journal of Parallel Programming.
[98] Jan-Philipp Weiss,et al. Facing the Multicore-Challenge - Aspects of New Paradigms and Technologies in Parallel Computing [Proceedings of a conference held at Stuttgart, Germany, September 19-21, 2012] , 2013, Facing the Multicore-Challenge.
[99] Theo Ungerer,et al. Asynchrony in Parallel Computing: From Dataflow to Multithreading , 2001, Scalable Comput. Pract. Exp..
[100] Jean-Luc Gaudiot,et al. The Sisal model of functional programming and its implementation , 1997, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis.
[101] Quinn Jacobson,et al. Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[102] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.
[103] Robert M. Keller,et al. Data Flow Program Graphs , 1982, Computer.
[104] Rosa M. Badia. Top down programming methodology and tools with StarSs - enabling scalable programming paradigms: extended abstract , 2011, ScalA '11.
[105] Jesús Labarta,et al. A high‐productivity task‐based programming model for clusters , 2012, Concurr. Comput. Pract. Exp..
[106] A. Gupta,et al. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.
[107] Kattamuri Ekanadham,et al. Incorporating Data Flow Ideas into von Neumann Processors for Parallel Execution , 1987, IEEE Transactions on Computers.
[108] Eiji Kuno,et al. The Architecture and Preliminary Evaluation Results of the Experimental Parallel Inference Machine PIM-D , 1986, ISCA.
[109] Krishna M. Kavi,et al. Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.
[110] Yale N. Patt,et al. HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.
[111] Gilles Kahn,et al. The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.
[112] V. G. Grafe,et al. The Epsilon dataflow processor , 1989, ISCA '89.
[113] Francesco Regazzoni,et al. Hardware Scheduling Support in SMP Architectures , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.
[114] A. Veidenbaum,et al. The cedar system and an initial performance study , 1993, ISCA '93.
[115] Rex W. Vedder,et al. The Hughes Data Flow Multiprocessor: architecture for efficient signal and data processing , 1985, ISCA 1985.
[116] Arvind,et al. T: A Multithreaded Massively Parallel Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[117] Michael D. McCool,et al. Performance evaluation of GPUs using the RapidMind development platform , 2006, SC.
[118] David E. Culler,et al. Two Fundamental Limits on Dataflow Multiprocessing , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.
[119] Steven Swanson,et al. The WaveScalar architecture , 2007, TOCS.
[120] Derek Chiou,et al. Performance Studies of Id on the Monsoon Dataflow System , 1993, J. Parallel Distributed Comput..
[121] Eduard Ayguadé,et al. Task superscalar: using processors as functional units , 2010 .
[122] Michael D. McCool,et al. Programming using RapidMind on the Cell BE , 2006, SC.
[123] Seth Copen Goldstein,et al. TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..
[124] R. S. Nikhil. Can dataflow subsume von Neumann computing? , 1989, ISCA '89.
[125] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[126] Krishna M. Kavi,et al. A Formal Definition of Data Flow Graph Models , 1986, IEEE Transactions on Computers.