Your favorite simulator here " Considered Harmful

ion errors than an unknown black box. Reviewers and the community needs to change its mindset as well – having blind faith in “standard tools,” while completely discounting other tools is not appropriate. We revisit the issue of open versus in-house tools in Section 4. 3.2. Pitfall 2: False confidence from validation overgeneralization in simulator papers, or tool misuses Simulator writers typically make narrow and factually consistent statements about validation, and some examples are below. However, the nature of validation is often misunderstood by users, and these tools are put to use in ways not intended for, including making quantitative generalizations. gem5’s OOO model is widely used, but as observed in a recent paper [13] and our observations above, it has several specification errors. Though the gem5 authors themselves do not claim it as such, some do claim it is a “validated simulator.” Clearly, this cannot be taken as all effects modeled. For instance, a technique that works on the instruction frontend must pay attention to gem5’s baseline and first fix the specification error described here [13]. Considering McPAT, according to their own documentation and code comments, constants are sometimes chosen to match the validation targets. We agree this is a reasonable decision in some cases, especially when highly customized logic is employed (e.g. functional unit implementations). The danger is when researchers attempt to generalize the results outside the validated processors. These constants will likely not be appropriate. For GPUWattch, it might be tempting for researchers to perform sensitivity studies by varying McPAT parameters. The path of least resistance would be to use the same scaling factors, instead of measuring the power of a known GPU and deriving new scaling factors using the GPUWattch methodology. For reasons described in the previous section, we argue that without obtaining new scaling factors, this type of sensitivity analysis would be inappropriate. Suggestions: Use with caution validated simulators. Look for details on the simulator’s design and factor those decisions

[1]  Karthikeyan Sankaralingam,et al.  Challenge benchmarks that must be conquered to sustain the gpu revolution , 2011 .

[2]  Mahmut T. Kandemir,et al.  OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.

[3]  Somayeh Sardashti,et al.  Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Mahmut T. Kandemir,et al.  Orchestrated scheduling and prefetching for GPGPUs , 2013, ISCA.

[5]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[6]  David Black-Schaffer,et al.  Efficient Embedded Computing , 2008, Computer.

[7]  Jung Ho Ahn,et al.  McPAT 1 . 0 : An Integrated Power , Area , and Timing Modeling Framework for Multicore Architectures ∗ , 2010 .

[8]  Thomas F. Wenisch,et al.  Practical off-chip meta-data for temporal memory streaming , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[9]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[10]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[11]  Edsger W. Dijkstra,et al.  Letters to the editor: go to statement considered harmful , 1968, CACM.

[12]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[13]  David Black-Schaffer,et al.  An Energy-Efficient Processor Architecture for Embedded Systems , 2008, IEEE Computer Architecture Letters.

[14]  Karthikeyan Sankaralingam,et al.  Sampling + DMR: Practical and low-overhead permanent fault detection , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[15]  Avi Mendelson,et al.  Threads vs. caches: Modeling the behavior of parallel workloads , 2010, 2010 IEEE International Conference on Computer Design.

[16]  Avi Mendelson,et al.  Many-Core vs. Many-Thread Machines: Stay Away From the Valley , 2009, IEEE Computer Architecture Letters.

[17]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[18]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[19]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.