Simultac Fonton: A Fine-Grain Architecture for Extreme Performance beyond Moore's Law

With nano-scale technology and Moore's Law end, architecture advance serves as the principal means of achieving enhanced efficiency and scalability into the exascale era. Ironically, the field that has demonstrated the greatest leaps of technology in the history of humankind, has retained its roots in its earliest strategy, the von Neumann architecture model which has imposed tradeoffs no longer valid for today's semiconductor technologies, although they were suitable through the 1980s. Essentially all commercial computers, including HPC, have been and are von Neumann derivatives. The bottlenecks imposed by this heritage are the emphasis on ALU/FPU utilization, single instruction issue and sequential consistency, and the separation of memory and processing logic ("von Neumann bottleneck"). Here the authors explore the possibility and implications of one class of non von Neumann architecture based on cellular structures, asynchronous multi-tasking, distributed shared memory, and message-driven computation. "Continuum Computer Architecture" is introduced as a genus of ultra-fine-grained architectures where complexity of operation is an emergent behavior of simplicity of design combined with highly replicated elements. An exemplar species of CCA, "Simultac" is considered comprising billions of simple elements, "fontons", of merged properties of data storage and movement combined with logical transformations. Employing the ParalleX execution model and a variation of the HPX+ runtime system software, the Simultac may provide the path to cost effective data analytics and machine learning as well as dynamic adaptive simulations in the trans-exaOPS performance regime.

[1]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[2]  Shekhar Y. Borkar,et al.  Supporting systolic and memory communication in iWarp , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  Henry G. Baker,et al.  Actors and Continuous Functionals , 1978, Formal Description of Programming Concepts.

[4]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[5]  A.P. Chandrakasan,et al.  A 256 kb 65 nm 8T Subthreshold SRAM Employing Sense-Amplifier Redundancy , 2008, IEEE Journal of Solid-State Circuits.

[6]  Jason Liu,et al.  A High-Density Subthreshold SRAM with Data-Independent Bitline Leakage and Virtual Ground Replica Scheme , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[7]  Thomas L. Sterling,et al.  An Application Driven Analysis of the ParalleX Execution Model , 2011, ArXiv.

[8]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[9]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[10]  Jeremiah J. Wilke,et al.  DARMA 0.3.0-alpha Specification , 2016 .

[11]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[12]  D. W. Scharpf,et al.  Finite element method — the natural approach , 1979 .

[13]  Thomas Sterling,et al.  SLOWER: A performance model for Exascale computing , 2014, Supercomput. Front. Innov..

[14]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[15]  A Syrbu,et al.  10 Gbps VCSELs with High Single Mode Output in 1310nm and 1550 nm Wavelength Bands , 2008, OFC/NFOEC 2008 - 2008 Conference on Optical Fiber Communication/National Fiber Optic Engineers Conference.