A single-chip 1.6 billion 16-b MAC/s multiprocessor DSP

A MIMD multiprocessor DSP chip containing four 64-b processing elements (PEs) interconnected by a 128-b pipelined split transaction bus (STBus) is presented. Each PE contains a 32-b RISC core with DSP enhancements and a 64-b SIMD vector co-processor with four 16-b MACS and a vector reduction unit. PEs are connected to the STBus through re-configurable dual-ported snooping L1 cache memories that support shared memory multiprocessing using a modified-MESI data coherency protocol. High-bandwidth data transfers between system memory and on-chip caches are managed in a pipelined memory controller that supports multiple outstanding transactions. An embedded RTOS dynamically schedules multiple tasks onto the PEs. Process synchronization is achieved using cached semaphores. The 120 mm/sup 2/ 0.25 /spl mu/m CMOS chip operates at 100 MHz and dissipates 4 W from a 3.3 V supply.