GP-SIMD Processing-in-Memory
暂无分享,去创建一个
[1] Eby G. Friedman,et al. AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.
[2] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Tomer Y. Morad,et al. Optimization of Asymmetric and Heterogeneous MultiCore , 2013 .
[4] Kevin Skadron,et al. Studying Thermal Management for Graphics-Processor Architectures , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[5] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[6] Bill Lynch,et al. Smart memory , 2010, 2010 IEEE Hot Chips 22 Symposium (HCS).
[7] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[8] Patrice Y. Simard,et al. Using GPUs for machine learning algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).
[9] Chun Chen,et al. The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.
[10] Uri C. Weiser,et al. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.
[11] G. Jack Lipovski,et al. The dynamic associative access memory chip and its application to SIMD processing and full-text database retrieval , 1999, Records of the 1999 IEEE International Workshop on Memory Technology, Design and Testing.
[12] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[13] Ran Ginosar,et al. Generalized MultiAmdahl: Optimization of Heterogeneous Multi-Accelerator SoC , 2014, IEEE Computer Architecture Letters.
[14] Coniferous softwood. GENERAL TERMS , 2003 .
[15] Ran Ginosar,et al. Thermal analysis of 3D associative processor , 2013, ArXiv.
[16] David A. Patterson,et al. Computer architecture (2nd ed.): a quantitative approach , 1996 .
[17] Fred J. Pollack. New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only) , 1999, MICRO.
[18] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[19] F. Black,et al. The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.
[20] Ran Ginosar,et al. The effect of communication and synchronization on Amdahl's law in multicore systems , 2013, Parallel Comput..
[21] Sheng-Chih Lin,et al. A self-consistent junction temperature estimation methodology for nanometer scale ICs with implications for performance and thermal management , 2003, IEEE International Electron Devices Meeting 2003.
[22] L. W. Tucker,et al. Architecture and applications of the Connection Machine , 1988, Computer.
[23] Gabriel H. Loh,et al. The Cost of Uncore in Throughput-Oriented Many-Core Processors , 2008 .
[24] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[25] Michael J. Quinn,et al. Designing Efficient Algorithms for Parallel Computers , 1987 .
[26] Gordon E. Sayre. STARAN: An associative approach to multiprocessor architecture , 1975, Computer Architecture.
[27] Feifei Li,et al. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.
[28] Kenneth E. Batcher. STARAN parallel processor system hardware , 1974, AFIPS '74.
[29] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[30] BurgerDoug,et al. The SimpleScalar tool set, version 2.0 , 1997 .
[31] Ardavan Pedram,et al. Algorithm/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[32] W. C. Meilander,et al. Array processor supercomputers , 1989, Proc. IEEE.
[33] Karthikeyan Sankaralingam,et al. Power challenges may end the multicore era , 2013, CACM.
[34] Jens H. Krüger,et al. GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.
[35] Andrew S. Cassidy,et al. Beyond Amdahl's Law: An Objective Function That Links Multiprocessor Performance Gains to Delay and Energy , 2012, IEEE Transactions on Computers.
[36] Ran Ginosar,et al. Efficient Dense and Sparse Matrix Multiplication on GP-SIMD , 2014, 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).
[37] Jaewook Shin,et al. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[38] Noah Treuhaft,et al. Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.
[39] Isaac D. Scherson,et al. Bit-Parallel Arithmetic in a Massively-Parallel Associative Processor , 1992, IEEE Trans. Computers.
[40] Peter M. Kogge,et al. PIM architectures to support petaflops level computation in the HTMT machine , 1999, Innovative Architecture for Future Generation High-Performance Processors and Systems (Cat. No.PR00650).
[41] Peter M. Kogge,et al. A low cost, multithreaded processing-in-memory system , 2004, WMPI '04.
[42] José E. Moreira,et al. Dissecting Cyclops: a detailed analysis of a multithreaded architecture , 2003, CARN.
[43] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[44] Robert Parker,et al. A PIM-based multiprocessor system , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[45] E. L. Cloud,et al. The geometric arithmetic parallel processor , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.
[46] Erdal Oruklu,et al. Performance evaluation of SRAM cells in 22nm predictive CMOS technology , 2009, 2009 IEEE International Conference on Electro/Information Technology.
[47] S. F. Reddaway. DAP—a distributed array processor , 1973, ISCA '73.
[48] Anant Agarwal,et al. Core Count vs Cache Size for Manycore Architectures in the Cloud , 2010 .
[49] Babak Falsafi,et al. Toward Dark Silicon in Servers , 2011, IEEE Micro.
[50] Thomas L. Sterling,et al. Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[51] Neil J. Gunther,et al. A Methodology for Optimizing Multithreaded System Scalability on Multi-cores , 2011, ArXiv.
[52] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[53] Avidan J. Akerib,et al. Associative approach to real time color, motion and stereo vision , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[54] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[55] C. Auth,et al. A 22nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[56] Dave Brown,et al. Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .
[57] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[58] R. Ginosar,et al. Convex Optimization of Resource Allocation in Asymmetric and Heterogeneous MultiCores , 2014 .
[59] Ken Kennedy,et al. Performance of parallel processors , 1989, Parallel Comput..
[60] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[61] John D. Owens,et al. General Purpose Computation on Graphics Hardware , 2005, IEEE Visualization.
[62] B. Parhami,et al. Content addressable parallel processors , 1978, Proceedings of the IEEE.
[63] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[64] Ran Ginosar,et al. Convex optimization of resource allocation in asymmetric and heterogeneous SoC , 2014, 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).
[65] Stephen L. Scott,et al. ASC: an associative-computing paradigm , 1994, Computer.
[66] Ran Ginosar,et al. Computer Architecture with Associative Processor Replacing Last-Level Cache and SIMD Accelerator , 2013, IEEE Transactions on Computers.
[67] Martin Hopkins,et al. Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.
[68] P. A. Ivey,et al. Architectural considerations of a wafer scale processor , 1988 .