Memory reference reduction and exploit parallelism for dsp and communication algorithms and systems implementations on digital signal processor
暂无分享,去创建一个
Embedded system can be defined as information processing systems embedded in larger systems. Digital signal processing and communication algorithms and systems are used in digital communication systems to process information in the digital form. Digital signal processors (DSP) are optimized to implement digital signal processing and communication algorithms and systems in software. In this dissertation, we study methods to reduce memory reference and exploit parallelism in implementing the digital signal processing and communication algorithms and systems on DSP to lower power consumption and decrease execution time, respectively.
FFT is a digital signal processing algorithm widely used in all kinds of digital communication systems to perform signal transformation between time and frequency domains. We propose the memory reference reduction methods to remove duplicated memory references due to identical twiddle factors in the FFT diagram. As a result, we achieve average of 76.2% reduction in the number of memory references and 53.1% saving of memory spaces consumed by twiddle factors, and average of 28.7% reduction in the number of clock cycles to compute radix-2 FFT on DSP comparing to conventional implementation. We further extend and apply the memory reference reduction method to the vector-radix two-dimensional FFT, which is an important algorithm used in digital signal processing systems for image and speech processing.
Another important building block of the digital communication systems is the channel coding to provide reliable communication through noisy channel. The most widely used communication algorithms for channel coding include the Viterbi and TURBO algorithms. For the Viterbi algorithm, we propose the register-exchange based Viterbi decoder implementation to achieve 28% faster comparing to conventional implementation based on trace-back method. For the TURBO algorithm, we optimize the implementation of the TURBO codec to achieve better instruction level parallelism.
Besides individual digital signal processing and communication algorithms, we also develop the optimized software implementation for full-rate IEEE 802.11 a compliant digital baseband transmitter system on DSP. The transmitter system includes functional blocks from scrambler to guard interval insertion defined IEEE 802.11 a PHY standard. We explore parallelism existing within and between individual functional blocks and achieve twice faster processing speed comparing to the existing software implementation of the transmitter on a chip with 22 processing cores.