Supporting experiments in computer systems research

Systems research is an experimental science. Most research in computer systems follows the trend of innovate (e.g. build a novel garbage collector) and then evaluate (e.g. does it significantly speed up our programs). Researchers use experiments to drive their work; they use experiments to identify bottlenecks and then again to determine if their innovations for addressing those bottlenecks are effective. If their experiments are not carried out properly, researchers may draw incorrect conclusions; they may end up wasting time on something that is not really a problem and may conclude their innovations are beneficial even when they are not. A complicating factor in computer systems experiments is the fact that computer systems are nonlinear dynamical systems, capable of complex and even chaotic behavior. A hallmark of chaos is a sensitive dependence on initial conditions—small changes to the system lead to a large effect on its overall behavior. This sensitivity complicates both observations of our systems and evaluations of our innovations. It complicates our observations because our measurement tools perturb the system they are observing. It complicates our evaluations because small changes to the environment in which we carry out our experiments can cause large and dramatic changes in system behavior. In this dissertation, we argue the systems community needs to support experiments with tools that allow a researcher to accurately observe her system and methodologies that allow researchers to accurately evaluate the impact of their innovations. To support our argument, we introduce two tools that allow researchers to accurately observe their application’s behavior and one methodology that allows researchers to accurately evaluate the impact of their innovations.

[1]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[2]  Andrew R. Bernat,et al.  Incremental call-path profiling: Research Articles , 2007 .

[3]  Urs Hölzle,et al.  Eliminating Virtual Function Calls in C++ Programs , 1996, ECOOP.

[4]  Allen Newell,et al.  Computer science as empirical inquiry: symbols and search , 1976, CACM.

[5]  Robert L. Glass,et al.  Science and substance: a challenge to software engineers , 1994, IEEE Software.

[6]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[7]  Steven McCanne,et al.  A Randomized Sampling Clock for CPU Utilization Estimation and Code Profiling , 1993, USENIX Winter.

[8]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[9]  Weibo Gong,et al.  Anomaly detection using call stack information , 2003, 2003 Symposium on Security and Privacy, 2003..

[10]  D. Feitelson Experimental Computer Science: the Need for a Cultural Change , 2006 .

[11]  David S. Johnson,et al.  A theoretician's guide to the experimental analysis of algorithms , 1999, Data Structures, Near Neighbor Searches, and Methodology.

[12]  Stephanie Forrest,et al.  Operating system stability and security through process homeostasis , 2002 .

[13]  Trishul M. Chilimbi,et al.  Preferential path profiling: compactly numbering interesting paths , 2007, POPL '07.

[14]  Peter J. Denning,et al.  Computing as a discipline , 1989, Computer.

[15]  J. Ioannidis Contradicted and initially stronger effects in highly cited clinical research. , 2005, JAMA.

[16]  Dirk Grunwald,et al.  Shadow Profiling: Hiding Instrumentation Costs with Parallelism , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[17]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[18]  John Whaley,et al.  A portable sampling-based profiler for Java virtual machines , 2000, JAVA '00.

[19]  Nate Kushman,et al.  Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor , 1998 .

[20]  Allen D. Malony,et al.  Overhead Compensation in Performance Profiling , 2004, Parallel Process. Lett..

[21]  Jerome A. Feldman,et al.  Rejuvenating experimental computer science: a report to the National Science Foundation and others , 1979, CACM.

[22]  Toshio Nakatani,et al.  How a Java VM can get more from a hardware performance monitor , 2009, OOPSLA.

[23]  Amer Diwan,et al.  Compiler support for garbage collection in a statically typed language , 1992, PLDI '92.

[24]  Craig B. Zilles Benchmark health considered harmful , 2001, CARN.

[25]  Toshiaki Yasue,et al.  A dynamic optimization framework for a Java just-in-time compiler , 2001, OOPSLA '01.

[26]  David Grove,et al.  Adaptive online context-sensitive inlining , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[27]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[28]  Daniel Citron MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences , 2003, ISCA '03.

[29]  Barton P. Miller,et al.  An Adaptive Cost System for Parallel Program Instrumentation , 1996, Euro-Par, Vol. I.

[30]  A. Waheed,et al.  A Structured Approach to Instrumentation System Development and Evaluation , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[31]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[32]  Michael D. Bond,et al.  Probabilistic calling context , 2007, OOPSLA.

[33]  Jong-Deok Choi,et al.  Accurate, efficient, and adaptive calling context profiling , 2006, PLDI '06.

[34]  Walter F. Tichy,et al.  Should Computer Scientists Experiment More? , 1998, Computer.

[35]  J. Michael Spivey,et al.  Fast, accurate call graph profiling , 2004, Softw. Pract. Exp..

[36]  Olivier Temam,et al.  Chaos in computer performance , 2005, Chaos.

[37]  Allen D. Malony,et al.  Perturbation analysis of high level instrumentation for SPMD programs , 1993, PPOPP '93.

[38]  Haleh Najafzadeh,et al.  Towards a framework for source code instrumentation measurement validation , 2005, WOSP '05.

[39]  Dror G. Feitelson Experimental analysis of the root causes of performance evaluation results: a backfilling case study , 2005, IEEE Transactions on Parallel and Distributed Systems.

[40]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[41]  Allen D. Malony,et al.  Advances in the TAU Performance System , 2011, Parallel Tools Workshop.

[42]  Haleh Najafzadeh,et al.  Validated observation and reporting of microscopic performance using Pentium II counter facilities , 2004, WOSP '04.

[43]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[44]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[45]  Saumya K. Debray,et al.  Alias analysis of executable code , 1998, POPL '98.

[46]  Holger Kantz,et al.  Practical implementation of nonlinear time series methods: The TISEAN package. , 1998, Chaos.

[47]  BodíkRastislav,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002 .

[48]  Rastislav Bodík,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.

[49]  Eli M. Dow,et al.  Xen and the Art of Repeated Research , 2004, USENIX Annual Technical Conference, FREENIX Track.

[50]  KawahitoMotohiro,et al.  A dynamic optimization framework for a Java just-in-time compiler , 2001 .

[51]  Amer Diwan,et al.  Understanding the behavior of compiler optimizations , 2006, Softw. Pract. Exp..

[52]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[53]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[54]  Nathan Froyd,et al.  Low-overhead call path profiling of unmodified, optimized code , 2005, ICS '05.

[55]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[56]  Mikhail Dmitriev Selective profiling of Java applications using dynamic bytecode instrumentation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[57]  Amer Diwan,et al.  Energy Consumption and Garbage Collection in Low-Powered Computing ; CU-CS-930-02 , 2002 .

[58]  David Grove,et al.  Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis , 1995, ECOOP.

[59]  Amer Diwan,et al.  Inferred call path profiling , 2009, OOPSLA 2009.

[60]  Brinkley Sprunt,et al.  Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.

[61]  Amer Diwan,et al.  Computer systems are dynamical systems. , 2009, Chaos.

[62]  James R. Larus,et al.  Optimally profiling and tracing programs , 1992, POPL '92.

[63]  Jack W. Davidson,et al.  Profile guided code positioning , 1990, SIGP.

[64]  Lieven Eeckhout,et al.  Using hpm-sampling to drive dynamic compilation , 2007, OOPSLA.

[65]  E. Duesterwald,et al.  Software profiling for hot path prediction: less is more , 2000, SIGP.

[66]  Matthew Arnold,et al.  Collecting and exploiting high-accuracy call graph profiles in virtual machines , 2005, International Symposium on Code Generation and Optimization.

[67]  W. Tichy Should Computer Scientists Experiment More? Computer Scientists and Practitioners Defend Their Lack of Experimentation with a Wide Range of Arguments. Some Arguments Suggest That , 1998 .

[68]  Peter J. Denning,et al.  Is computer science science? , 2005, CACM.