论文信息 - The Raw Processor: A Composeable 32-Bit Fabric for Embedded and General Purpose Computing

The Raw Processor: A Composeable 32-Bit Fabric for Embedded and General Purpose Computing

The Raw project is attempting to create a scalable processor architecture that is suitable for both general purpose and embedded computations. Current general purpose processors differ from embedded devices in that they provide large amounts of hardware support to discover and manipulate instruction-level parallelism and unstructured memory accesses. Because the parallelism in embedded computations is much more predictable, embedded devices such as DSPs do not offer a rich set of mechanisms, rather they devote their area to computational resources such as pipelined floating point, thereby achieving significantly better area and energy efficiency. However, their best performance is achieved for regular data access patterns such as streams, and they often require assembly code manipulation. Embedded FP-GAs and ASICs go one step further, and can offer even better results for many classes of computations, but require a hardware design step in mapping their applications into silicon. Raw will support many classes of computations that traditionally have run on microprocessors, DSPs, FPGAs and ASICs. Raw implements a simple, highly parallel, tiled architecture, and exposes its interconnect , I/O, memory and computational elements to the compiler [5]. This exposure allows the software system to allocate resources and coordinate data flow within the chip in an application-specific manner. Furthermore, the tiled, replicated architecture of Raw allows it to scale with increasing silicon densities. As depicted in Figure 1, the Raw processor is a single chip containing 16 identical processor-sized tiles connected in a 4-by-4 mesh configuration by four nearest neighbor point-to-point pipelined high-speed

[1] Vivek Sarkar,et al. The Raw Compiler Project , 1999 .

[2] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.

[3] Rajeev Barua,et al. Memory bank disambiguation using modulo unrolling for Raw machines , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[4] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[5] Rajeev Barua,et al. Maps: a compiler-managed memory system for raw machines , 1999, ISCA.