With the rapid development of silicon technology, chip die size and clock frequency increase; it becomes very difficult to further increase the performance of silicon devices only by speeding up their clock. We believe a more practical way to increase their speed is to use the abundant transistor resource to implement several cores and make the cores execute in parallel. In the paper, we propose a processor-coprocessor architecture to increase the speed of the most frequently-used short program segments and reduce their power consumption. As these segments dominate the dynamic execution trace of embedded programs, the overall increment of performance and power-saving is significant. A dataflow coprocessor and a RISC coprocessor are implemented for comparison. The experimental results show the dataflow coprocessor is faster and more power-efficient than the other one, due to the fact that the dataflow coprocessor offers natural properties for fine-grained instruction-level parallel processing and its parallelism is self-scheduling. Except for data dependencies, there is no constrained sequentiality, so a dataflow program allows all forms of instruction parallelism.
[1]
John G. Proakis,et al.
Digital signal processing (3rd ed.): principles, algorithms, and applications
,
1996
.
[2]
G. Amdhal,et al.
Validity of the single processor approach to achieving large scale computing capabilities
,
1967,
AFIPS '67 (Spring).
[3]
David A. Patterson,et al.
Computer Architecture: A Quantitative Approach
,
1969
.
[4]
Jack B. Dennis,et al.
A preliminary architecture for a basic data-flow processor
,
1974,
ISCA '98.
[5]
Yijun Liu,et al.
The Design of a Dataflow Coprocessor for Low Power Embedded Hierarchical Processing
,
2006,
PATMOS.
[6]
John G. Proakis,et al.
Digital Signal Processing: Principles, Algorithms, and Applications
,
1992
.
[7]
Arthur H. Veen,et al.
Dataflow machine architecture
,
1986,
CSUR.