A literate experimentation manifesto

This paper proposes a new approach to experimental computer systems research, which we call Literate Experimentation. Conventionally, experimental procedure and writeup are divided into distinct phases: i.e. setup (the method), data collection (the results) and analysis (the evaluation of the results). Our concept of a literate experiment is to have a single, rich, human-generated, text-based description of a particular experiment, from which can be automatically derived: (1) a summary of the experimental setup to include in the paper; (2) a sequence of executable commands to setup a computer platform ready to perform the actual experiment; (3) the experiment itself, executed on this appropriately configured platform; and, (4) a means of generating results tables and graphs from the experimental output, ready for inclusion in the paper. Our Literate Experimentation style has largely been inspired by Knuth's Literate Programming philosophy. Effectively, a literate experiment is a small step towards the executable paper panacea. In this work, we argue that a literate experimentation approach makes it easier to produce rigorous experimental evaluation papers. We suggest that such papers are more likely to be accepted for publication, due to (a) the imposed uniformity of structure, and (b) the assurance that experimental results are easily reproducible. We present a case study of a prototype literate experiment involving memory management in Jikes RVM.

[1]  W. Tichy Should Computer Scientists Experiment More? Computer Scientists and Practitioners Defend Their Lack of Experimentation with a Wide Range of Arguments. Some Arguments Suggest That , 1998 .

[2]  K. Nwogu The medical research paper: Structure and functions , 1997 .

[3]  Amer Diwan,et al.  Wake up and smell the coffee: evaluation methodology for the 21st century , 2008, CACM.

[4]  Fausto Giunchiglia,et al.  Publish and perish: why the current publication and review model is killing research and wasting your money , 2007, UBIQ.

[5]  David L. Donoho,et al.  A Universal Identifier for Computational Results , 2011, ICCS.

[6]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[7]  Mary Shaw,et al.  Writing good software engineering research papers: minitutorial , 2003, ICSE 2003.

[8]  Walter F. Tichy,et al.  Should Computer Scientists Experiment More? , 1998, Computer.

[9]  Douglas Kramer,et al.  API documentation from source code comments: a case study of Javadoc , 1999, SIGDOC '99.

[10]  Mary Shaw,et al.  What makes good research in software engineering? , 2002, International Journal on Software Tools for Technology Transfer.

[11]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[12]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[13]  Mary Shaw,et al.  Writing good software engineering research papers , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[14]  Z. Šesták Writing Papers in the Biological Sciences , 2004, Photosynthetica.

[15]  Keshav Pingali,et al.  Compiler research: the next 50 years , 2009, CACM.

[16]  Piotr Nowakowski,et al.  The Collage Authoring Environment , 2011, ICCS.

[17]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[18]  Steffen Mazanek,et al.  SHARE: a web portal for creating and sharing executable research papers , 2011, ICCS.

[19]  David J. Lilja,et al.  Measuring computer performance : A practitioner's guide , 2000 .

[20]  Simon Reynolds,et al.  Rip It Up and Start Again , 2005 .

[21]  Vivek Sarkar,et al.  The Jikes Research Virtual Machine project: Building an open-source research community , 2005, IBM Syst. J..

[22]  Telecommunications Board,et al.  Academic Careers for Experimental Computer Scientists and Engineers , 1994 .

[23]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[24]  Gavin Brown,et al.  The economics of garbage collection , 2010, ISMM '10.

[25]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.