Circuits and systems society VLSI transactions best paper award-2003

The clock distribution and generation circuitry forms a critical component of current synchronous digital systems and is known to consume around a quarter of the power budget of existing microprocessors. We propose and validate a high level model for evaluating the energy dissipation of the clock generation and distribution circuitry, including both the dynamic and leakage power components. The validation results show that the model is reasonably accurate, with the average deviation being within 10% of SPICE simulations. Access to this model can enable further research at high-level design stages in optimizing the system clock power. To illustrate this, a few architectural modifications are considered and their effect on the clock sub-system and the total system power budget is assessed. Summary L ow energy dissipation is not only of interest for portable devices, where maximizing battery life is a design rule, but also for non-mobile systems, where chip-level issues such as power delivery and packaging, and system-level issues such as integration, cooling and case design are important. The task of lowering energy dissipation has been attacked at all stages of the design process (i.e., transistor, gate, logic and architectural levels) with the practical limits already being reached at the lower levels. Thus, the responsibility of achieving power savings is now also a duty of the system architect, and, from this perspective, the design of the clock distribution sub-system remains as one of the main challenges since it consumes up to 40% of the total dynamic power budget of current high performance microprocessors. The work presented here provides a complete, accurate and flexible power model that captures all components of the clock network and analyzes in more detail those components that contribute strongly to the total clock power in a case by case basis. Once effective capacitance expressions were obtained for all the different clock network load contributors, they were validated by direct comparison against SPICE simulation results. The memory structures include units such as data and instruction caches, data and instruction TLBs, the register file, the branch history table in the branch predictor, the instruction issue window, the load/store queue and other similar constructs. The expression obtained for the estimation of the clock load of a general purpose memory array yielded an average (maximum) error is 5% (12%) with respect to the simulated values. The expression for the pipeline registers’ clock by David E. Duarte, N. Vijaykrishnan, and Mary Jane Irwin 36 IEEE CIRCUITS AND SYSTEMS MAGAZINE 1540-7977/03/$17.00©2003 IEEE THIRD QUARTER 2003 Transactions A Clock Power Model to Evaluate Impact of Architectural and Technology Optimizations— A Summary