ArchExplorer.org: joint compiler/hardware exploration for fair comparison of architectures

While reproducing experimental results of research articles is standard practice in mature domains of science, such as physics or biology, it has not yet become mainstream in computer architecture. However, recent research shows that the lack of a fair and broad comparison of research ideas can be significantly detrimental to the progress, and thus the productivity, of research. At the same time, the complexity of architecture simulators and the fact that simulators are not systematically disseminated with novel ideas are largely responsible for this situation. While this methodology has a fundamental impact on research, it is by essence a practical issue. In this article, we present and put to task an atypical approach which aims at overcoming this practical methodology issue, and which takes the form of an open and continuous exploration through a server-side web infrastructure. First, rather than requiring from a researcher to engage in the daunting task of seeking, installing and running the simulators of many alternative mechanisms, we propose that researchers upload their simulator to the infrastructure, where the corresponding mechanism is automatically compared against all known ideas so far. Second, the comparison takes the form of a broad compiler/hardware exploration, so that a new mechanism is deemed superior only if it can outperform a tuned baseline and all known tuned mechanisms, for a given area and/or power budget. These two principles considerably facilitate a fair and quantitative comparison of research ideas. The web infrastructure is now publicly open, and we put the overall approach to task with a set of data cache mechanisms. We explain how the tools and methodological issues of contributed simulators can be overcome, and we show that this broad exploration can challenge some earlier assessments about data cache research.

[1]  David I. August,et al.  Microarchitectural exploration with Liberty , 2002, MICRO 35.

[2]  Greg Hamerly,et al.  SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .

[3]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[4]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[5]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[6]  Srihari Makineni,et al.  Exploring the cache design space for large scale CMPs , 2005, CARN.

[7]  Margaret Martonosi,et al.  Timekeeping in the memory system: predicting and optimizing memory behavior , 2002, ISCA.

[8]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[9]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[10]  Olivier Temam,et al.  UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development , 2007, IEEE Computer Architecture Letters.

[11]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[12]  Rakesh Kumar,et al.  Magellan: A Search and Machine Learning-based Framework for Fast Multi-core Design Space Exploration and Optimization , 2008, 2008 Design, Automation and Test in Europe.

[13]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[14]  David G. Messerschmitt,et al.  Overview of the Ptolemy Project , 1996 .

[15]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[16]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[17]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[18]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[19]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[20]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[21]  Olivier Temam,et al.  MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[22]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[23]  Dirk Grunwald,et al.  A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.

[24]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[25]  Vittorio Zaccaria,et al.  Multi-objective design space exploration of embedded systems , 2003, J. Embed. Comput..

[26]  Kevin Skadron,et al.  CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[27]  Edward A. Lee,et al.  Overview of the Ptolemy project , 2001 .

[28]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[29]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[30]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[31]  Ramon Canal,et al.  Design space exploration for multicore architectures: a power/performance/thermal view , 2006, ICS '06.