Library for matrix multiplication-based data manipulation on a “mesh-of-tori” architecture
暂无分享,去创建一个
[1] Toshiaki Miyazaki,et al. Rapid*Closure: Algebraic Extensions of a Scalar Multiply-add Operation , 2010, CATA.
[2] Stanislav G. Sedukhin,et al. An O(n) Time-Complexity Matrix Transpose on Torus Array Processor , 2011, 2011 Second International Conference on Networking and Computing.
[3] Shorin Kyo,et al. An integrated memory array processor architecture for embedded image recognition systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[4] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[5] Marcin Paprzycki,et al. Parallel Gaussian Elimination Algorithms on a Cray Y-MP , 1995, Informatica.
[6] Marcin Paprzycki,et al. Generalizing Matrix Multiplication for Efficient Computations on Modern Computers , 2011, PPAM.
[7] Ruud van der Pas,et al. Memory Hierarchy in Cache-Based Systems , 2002 .
[8] Apostolos Dollas,et al. Predicting and precluding problems with memory latency , 1994, IEEE Micro.
[9] Shorin Kyo,et al. An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems , 2005, ISCA 2005.
[10] Aamir Zia,et al. Mitigating Memory Wall Effects in High-Clock-Rate and Multicore CMOS 3-D Processor Memory Stacks , 2009, Proceedings of the IEEE.
[11] A. Ravankar,et al. Image scrambling based on a new linear transform , 2011, 2011 International Conference on Multimedia Technology.
[12] Erik H.M. Heijne. Gigasensors for an Attoscope: Catching Quanta in CMOS , 2008, IEEE Solid-State Circuits Newsletter.
[13] D. Scott Wills,et al. Systolic Opportunities for Multidimensional Data Streams , 2002, IEEE Trans. Parallel Distributed Syst..
[14] Fred G. Gustavson,et al. Cache Blocking for Linear Algebra Algorithms , 2011, PPAM.
[15] Dietmar Fey,et al. Marching-pixels: a new organic computing paradigm for smart sensor processor arrays , 2005, CF '05.
[16] Stanislav G. Sedukhin,et al. Mesh-of-Tori: A Novel Interconnection Network for Frontal Plane Cellular Processors , 2010, 2010 First International Conference on Networking and Computing.
[17] P. Machanick. Approaches to Addressing the Memory Wall , 2022 .
[18] Andreas Linke,et al. Parallelization of the Two-Dimensional Ising Model on a Cluster of IBM RISC System / 6000 Workstations , 1993, Parallel Comput..
[19] Jack Dongarra,et al. MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .
[20] Francesc Alted,et al. Why Modern CPUs Are Starving and What Can Be Done about It , 2010, Computing in Science & Engineering.
[21] Ákos Zarándy,et al. Focal-Plane Sensor-Processor Chips , 2014 .
[22] Toshiaki Miyazaki,et al. Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms , 2010, 2010 39th International Conference on Parallel Processing Workshops.