Matter: Modular Adaptive Technology Targeting Efficient Reasoning

Abstract : The objective of this effort was to investigate novel computer architectures to support machine learning, based on reconfigurable hardware and nanowire growth. The scope of this effort was to bring revolutionary architectural ideas together with application drivers that embody cognitive processing dimensions such as machine learning, large knowledge bases, information security and integrity, real-world reasoning, sensor integration and real time embedded systems. Conventional processing architectures are ill-suited to processing the large, sparse, graph data structures necessary to efficiently represent cognitive information and computations. Today's silicon hardware can support a large number of parallel operations and high bandwidth and low latency from small, distributed memories. However, traditional von Neumann architectures employ a single-memory, single-instruction stream model that prevents them from fully exploiting the hardware capabilities. This mismatch presents an opportunity to design new hardware architectures that will provide substantially better performance on graph-intensive information processing tasks, which can perform parallel operations over large data structures. To support these tasks while exploiting the silicon, the MATTER architecture described in this report distributes the data structure over a large number of small, fast memories, and associates active logic with each fragment so that it can perform the necessary operations on its local data. In addition this report describes the exploration into nanowire technology, focusing on the growth of new connections. This is a unique capability of nanowire implementations, which could provide a mechanism for adaptation over time.

[1]  Mary Shaw,et al.  Software architecture - perspectives on an emerging discipline , 1996 .

[2]  P. Cockshott,et al.  Realising massively concurrent systems on the SPACE machine , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[3]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[4]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[5]  David A. Patterson,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  Carl Ebeling,et al.  PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.

[8]  John Wawrzynek,et al.  Hardware-assisted fast routing , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[9]  Norman Margolus,et al.  An FPGA architecture for DRAM-based systolic computations , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[10]  Anant Agarwal,et al.  Solving graph problems with dynamic computation structures , 1996, Other Conferences.

[11]  Scott E. Fahlman,et al.  NETL: A System for Representing and Using Real-World Knowledge , 1979, CL.

[12]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE): Introduction and Tutorial , 2000 .

[13]  Larry Carter,et al.  Multi-processor Performance on the Tera MTA , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[14]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[15]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[16]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[17]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[18]  Charles L. Seitz,et al.  Design of the Mosaic Element , 1983 .

[19]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[20]  John Paul Shen,et al.  A limit study of local memory requirements using value reuse profiles , 1995, MICRO 1995.

[21]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[22]  Tsutomu Maruyama,et al.  A Cellular Automata System with FPGA , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[23]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[24]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[25]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[26]  Andrew W. Appel,et al.  Continuation-passing, closure-passing style , 1989, POPL '89.

[27]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[28]  Andrew B. Kahng,et al.  Improved algorithms for hypergraph bipartitioning , 2000, ASP-DAC '00.

[29]  W. M. Haynes CRC Handbook of Chemistry and Physics , 1990 .

[30]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.