A 20ns Cmos Dsp Core For Video-signal Processing

THIS PAPER WILL DESCRIBE a programmable 8 b digital signal processor (DSP) with an instruction cycle time of 20ns for video signal processing. A 37.4mm2 chip was fabricated using advanced l.Opm, double-level metal CMOS technology; Figure 1. The DSP has a reconfigurable high-speed data path supporting several multiply/accumulate functions, including 16-tap linear-phase transversal filtering, high-speed adaptive filtering and 8-tap discrete cosine transformation (DCT). Flexible architecture allows use as a building block for a wide range of digital video-signal processing. A block diagram of the DSP core is shown in Figure 2. The multiply/accumulate data path consists of an 8 x 8 full-precision two’s complement parallel multiplier (MLT), two 12b arithmetic logic units (ALUA and ALUB) and two sets of sixteen 12b accumulators (ACCA and ACCB). To achieve a 20ns cycle time, while maintaining versatility, the direct interconnections between these units, the data memories, or the external data interfaces can be changed. These interconnections are preset prior to signal processing. Besides the high-speed data path, a conventional data bus exists for data transfers that require lower speed: presetting the data path, downloading the program to the memories, and testing. Three data-path configurations, implemented in the DSP, (Figure 3 ) were designed for (a) single-chip linear-phase transversal filtering, ( b ) cascaded linear-phase transversal filtering, and ( c ) discrete cosine transformation. For the single-chip filtering, the multiplier output is fed to two ALUs. The configuration in Figure 3a implements the linearphase transversal filter algorithm. In this algorithm, the transversal-filter coefficients are arranged symmetrically so that the product of an inpui signal and a certain coefficient is used in two taps. With this configuration, the DSP can process two taps in a single instruction cycle. This type of filter is widely used in videosignal processing. In cascaded filtering, direct data paths to and from the two ALUs are preset. Program controls whether the input registers of each ALU are fed from external data pins or internal accumulators.