Library for matrix multiplication-based data manipulation on a “mesh-of-tori” architecture

Recent developments in computational sciences, involving both hardware and software, allow reflection on the way that computers of the future will be assembled and software for them written. In this contribution we combine recent results concerning possible designs of future processors, ways they will be combined to build scalable (super)computers, and generalized matrix multiplication. As a result we propose a novel library of routines, based on generalized matrix multiplication that facilitates (matrix/image) manipulations.

[1]  Toshiaki Miyazaki,et al.  Rapid*Closure: Algebraic Extensions of a Scalar Multiply-add Operation , 2010, CATA.

[2]  Stanislav G. Sedukhin,et al.  An O(n) Time-Complexity Matrix Transpose on Torus Array Processor , 2011, 2011 Second International Conference on Networking and Computing.

[3]  Shorin Kyo,et al.  An integrated memory array processor architecture for embedded image recognition systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[4]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[5]  Marcin Paprzycki,et al.  Parallel Gaussian Elimination Algorithms on a Cray Y-MP , 1995, Informatica.

[6]  Marcin Paprzycki,et al.  Generalizing Matrix Multiplication for Efficient Computations on Modern Computers , 2011, PPAM.

[7]  Ruud van der Pas,et al.  Memory Hierarchy in Cache-Based Systems , 2002 .

[8]  Apostolos Dollas,et al.  Predicting and precluding problems with memory latency , 1994, IEEE Micro.

[9]  Shorin Kyo,et al.  An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems , 2005, ISCA 2005.

[10]  Aamir Zia,et al.  Mitigating Memory Wall Effects in High-Clock-Rate and Multicore CMOS 3-D Processor Memory Stacks , 2009, Proceedings of the IEEE.

[11]  A. Ravankar,et al.  Image scrambling based on a new linear transform , 2011, 2011 International Conference on Multimedia Technology.

[12]  Erik H.M. Heijne Gigasensors for an Attoscope: Catching Quanta in CMOS , 2008, IEEE Solid-State Circuits Newsletter.

[13]  D. Scott Wills,et al.  Systolic Opportunities for Multidimensional Data Streams , 2002, IEEE Trans. Parallel Distributed Syst..

[14]  Fred G. Gustavson,et al.  Cache Blocking for Linear Algebra Algorithms , 2011, PPAM.

[15]  Dietmar Fey,et al.  Marching-pixels: a new organic computing paradigm for smart sensor processor arrays , 2005, CF '05.

[16]  Stanislav G. Sedukhin,et al.  Mesh-of-Tori: A Novel Interconnection Network for Frontal Plane Cellular Processors , 2010, 2010 First International Conference on Networking and Computing.

[17]  P. Machanick Approaches to Addressing the Memory Wall , 2022 .

[18]  Andreas Linke,et al.  Parallelization of the Two-Dimensional Ising Model on a Cluster of IBM RISC System / 6000 Workstations , 1993, Parallel Comput..

[19]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[20]  Francesc Alted,et al.  Why Modern CPUs Are Starving and What Can Be Done about It , 2010, Computing in Science & Engineering.

[21]  Ákos Zarándy,et al.  Focal-Plane Sensor-Processor Chips , 2014 .

[22]  Toshiaki Miyazaki,et al.  Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms , 2010, 2010 39th International Conference on Parallel Processing Workshops.