The Role of Deliberate Artificial Design Elements in Software Engineering Experiments

Increased realism in software engineering experiments is often promoted as an important means of increasing generalizability and industrial relevance. In this context, artificiality, e.g., the use of constructed tasks in place of realistic tasks, is seen as a threat. In this paper, we examine the opposite view that deliberately introduced artificial design elements may increase knowledge gain and enhance both generalizability and relevance. In the first part of this paper, we identify and evaluate arguments and examples in favor of and against deliberately introducing artificiality into software engineering experiments. We find that there are good arguments in favor of deliberately introducing artificial design elements to 1) isolate basic mechanisms, 2) establish the existence of phenomena, 3) enable generalization from particularly unfavorable to more favorable conditions (persistence of phenomena), and 4) relate experiments to theory. In the second part of this paper, we summarize a content analysis of articles that report software engineering experiments published over a 10-year period from 1993 to 2002. The analysis reveals a striving for realism and external validity, but little awareness of for what and when various degrees of artificiality and realism are appropriate. Furthermore, much of the focus on realism seems to be based on a narrow understanding of the nature of generalization. We conclude that an increased awareness and deliberation as to when and for what purposes both artificial and realistic design elements are applied is valuable for better knowledge gain and quality in empirical software engineering experiments. We also conclude that time spent on studies that have obvious threats to validity that are due to artificiality might be better spent on studies that investigate research questions for which artificiality is a strength rather than a weakness. However, arguments in favor of artificial design elements should not be used to justify studies that are badly designed or that have research questions of low relevance.

[1]  Jochen Ludewig,et al.  Simulation in software engineering training , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[2]  Tore Dybå,et al.  The Future of Empirical Methods in Software Engineering Research , 2007, Future of Software Engineering (FOSE '07).

[3]  R. Hogarth Beyond discrete biases: Functional and dysfunctional aspects of judgmental heuristics. , 1981 .

[4]  D. Boehm-Davis,et al.  Mental representations of programs for student and professional programmers , 1987 .

[5]  W. Salmon Four decades of scientific explanation , 1989 .

[6]  Robert Sugden,et al.  Experiments as exhibits and experiments as tests , 2005 .

[7]  J. Mackie,et al.  I . CAUSES AND CONDITIONS , 2008 .

[8]  R. Yin Case Study Research: Design and Methods , 1984 .

[9]  Mark F. Zimbelman,et al.  A cognitive footprint in archival data: Generalizing the dilution effect from laboratory to field settings , 2003 .

[10]  Adam A. Porter,et al.  Comparing Detection Methods For Software Requirements Inspections: A Replication Using Professional Subjects , 1998, Empirical Software Engineering.

[11]  M. Jorgensen,et al.  Assessing uncertainty of software development effort estimates: the learning from outcome feedback , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[12]  Forrest Shull,et al.  The empirical investigation of Perspective-Based Reading , 1995, Empirical Software Engineering.

[13]  Daniel R. Ilgen,et al.  Laboratory Research: A Question of When, Not If. , 1985 .

[14]  B. Lundman,et al.  Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. , 2004, Nurse education today.

[15]  Siegfried Kracauer,et al.  The Challenge of Qualitative Content Analysis , 1952 .

[16]  Lionel C. Briand,et al.  An Experimental Comparison of the Maintainability of Object-Oriented and Structured Design Documents , 2004, Empirical Software Engineering.

[17]  Iris Vessey,et al.  Requirements specification: learning object, process, and data methodologies , 1994, CACM.

[18]  J. Lynch Theory and external validity , 1999 .

[19]  H. Feigl,et al.  Minnesota studies in the philosophy of science , 1956 .

[20]  L. Kattsoff,et al.  The Structure of Theories , 1957 .

[21]  Timothy D. Wilson,et al.  Experimentation in social psychology. , 1998 .

[22]  Robert Sugden,et al.  Experiment, theory, world: A symposium on the role of experiments in economics , 2005 .

[23]  Terry Boswell,et al.  The Scope of General Theory , 1999 .

[24]  Jens Liegle,et al.  The efficacy of matching information systems development methodologies with application characteristics - an empirical study , 1999, J. Syst. Softw..

[25]  D. Thomas King,et al.  Mind over Machine. , 1978 .

[26]  Mark Keil,et al.  An investigation of risk perception and risk propensity on the decision to continue a software development project , 2000, J. Syst. Softw..

[27]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[28]  Claes Wohlin,et al.  A project effort estimation study , 1998, Inf. Softw. Technol..

[29]  Merrill Carlsmith Methods of research in social psychology , 1976 .

[30]  Tore Dybå,et al.  Conducting realistic experiments in software engineering , 2002, Proceedings International Symposium on Empirical Software Engineering.

[31]  Magne Jørgensen,et al.  Assessing uncertainty of software development effort estimates: the learning from outcome feedback , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[32]  Wolfgang Christian,et al.  Dynamics of Complex Systems (Studies in Nonlinearity) , 1998 .

[33]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[34]  P. Mayring Qualitative Content Analysis , 2000 .

[35]  Lionel C. Briand,et al.  A Controlled Experiment for Evaluating Quality Guidelines on the Maintainability of Object-Oriented Designs , 2001, IEEE Trans. Software Eng..

[36]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[37]  Gregg Rothermel,et al.  WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[38]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[39]  Oliver Laitenberger,et al.  Perspective-based reading of code documents at Robert Bosch GmbH , 1997, Inf. Softw. Technol..

[40]  Dennis Wixon,et al.  User performance with command, menu, and iconic interfaces , 1985, CHI '85.

[41]  John W. Daly,et al.  Evaluating inheritance depth on the maintainability of object-oriented software , 2004, Empirical Software Engineering.

[42]  Michael X Cohen,et al.  Harnessing Complexity: Organizational Implications of a Scientific Frontier , 2000 .

[43]  张谷 实验经济学(Experimental Economics)研究思路及成果应用简述 , 1994 .

[44]  D. Whetten What Constitutes a Theoretical Contribution , 1989 .

[45]  Kenneth R. Hammond,et al.  Scientific Information, Social Values, and Policy Formation: The Application of Simulation Models and Judgment Analysis to the Denver Regional Air Pollution Problem , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[46]  Samuel B. Bacharach,et al.  Organizational Theories: Some Criteria for Evaluation , 1989 .

[47]  Sallie M. Henry,et al.  Using Belbin's leadership role to improve team effectiveness: An empirical investigation , 1999, J. Syst. Softw..

[48]  Roy A. Maxion,et al.  Eliminating Exception Handling Errors with Dependability Cases: A Comparative, Empirical Study , 2000, IEEE Trans. Software Eng..

[49]  J. Millman,et al.  Toward Reform of Program Evaluation , 1981 .

[50]  Francesco Guala,et al.  Economics in the lab: Completeness vs. testability , 2005 .

[51]  A. Nash,et al.  THE RELATIVE PRACTICAL EFFECTIVENESS OF PROGRAMMED INSTRUCTION , 1971 .

[52]  A. Hibbs QED: The Strange Theory of Light and Matter , 1986 .

[53]  Michael E. Doherty,et al.  Social judgement theory , 1996 .

[54]  Walter Franz Tichy,et al.  Should Computer Scientists Experiment More? - 16 Excuses to Avoid Experimentation , 1997 .

[55]  Alexander M. Fedorec,et al.  Measuring the comprehensibility of Z specifications , 1998, J. Syst. Softw..

[56]  Timothy D. Wilson,et al.  Social Psychology: The Heart and the Mind , 1992 .

[57]  Alessandro Bianchi,et al.  A controlled experiment to assess the effectiveness of inspection meetings , 2001, Proceedings Seventh International Software Metrics Symposium.

[58]  Ray W. Cooksey,et al.  Judgment analysis : theory, methods, and applications , 1996 .

[59]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[60]  Walter F. Tichy,et al.  Should Computer Scientists Experiment More? , 1998, Computer.

[61]  K. J. Ives,et al.  Experimental Methods (2) , 1978 .

[62]  J. I The Design of Experiments , 1936, Nature.

[63]  Tore Dybå,et al.  A Systematic Review of Theory Use in Software Engineering Experiments , 2007, IEEE Transactions on Software Engineering.

[64]  Oliver Laitenberger,et al.  Perspective-based Reading of Code Documents at , 1997 .

[65]  Oliver Laitenberger,et al.  (Quasi-)Experimental Studies in Industrial Settings , 2003, Lecture Notes on Empirical Software Engineering.

[66]  Mark Billinghurst,et al.  Crossing the Chasm , 2001 .

[67]  Richard W. Scamell,et al.  An experimental investigation of the impact of individual, program, and organizational characteristics on software maintenance effort , 2000, J. Syst. Softw..

[68]  Daniel Kahneman,et al.  Anomalies: The Endowment Effect, Loss Aversion, and Status Quo Bias , 1991 .

[69]  David John Jankowski,et al.  A cognitive information processing and information theory approach to diagram clarity: A synthesis and experimental investigation , 1999, J. Syst. Softw..

[70]  E. A. Locke,et al.  Generalizing From Laboratory to Field Settings. , 1987 .

[71]  Robin M. Hogarth,et al.  The challenge of representative design in psychology and economics , 2004 .

[72]  Kishore Sengupta,et al.  Software Project Control: An Experimental Investigation of Judgment with Fallible Information , 1993, IEEE Trans. Software Eng..

[73]  Khaled El Emam,et al.  An Internally Replicated Quasi-Experimental Comparison of Checklist and Perspective-Based Reading of Code Documents , 2001, IEEE Trans. Software Eng..

[74]  Audris Mockus,et al.  Formulation and preliminary test of an empirical theory of coordination in software engineering , 2003, ESEC/FSE-11.

[75]  E. Rogers,et al.  Diffusion of innovations , 1964, Encyclopedia of Sport Management.

[76]  R. Gervais,et al.  Blind Source Separation and Sparse Bump Modelling of Time Frequency Representation of Eeg Signals: New Tools for Early Detection of Alzheimer's Disease , 2022 .

[77]  Tore Dybå,et al.  Building Theories in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[78]  Stefan Biffl,et al.  Investigating the cost-effectiveness of reinspections in software development , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[79]  Albert Endres,et al.  A handbook of software and systems engineering - empirical observations, laws and theories , 2003, The Fraunhofer IESE series on software engineering.

[80]  Claes Wohlin,et al.  Using Students as Subjects—A Comparative Study of Students and Professionals in Lead-Time Impact Assessment , 2000, Empirical Software Engineering.

[81]  Frank Houdek,et al.  External Experiments - A Workable Paradigm for Collaboration Between Industry and Academia , 2003, Lecture Notes on Empirical Software Engineering.

[82]  H. Simon The Sciences of the Artificial, (Third edition) , 1997 .

[83]  Dennis F. Galletta,et al.  Cognitive Fit: An Empirical Study of Information Acquisition , 1991, Inf. Syst. Res..

[84]  J. Knottnerus,et al.  Real world research. , 2010, Journal of clinical epidemiology.

[85]  R. Cooksey The Methodology of Social Judgement Theory , 1996 .

[86]  Ritu Agarwal,et al.  Comprehending Object and Process Models: An Empirical Study , 1999, IEEE Trans. Software Eng..

[87]  J. Lucas,et al.  Theory-Testing, Generalization, and the Problem of External Validity* , 2003 .

[88]  T. Troward Causes and conditions. , 1919 .

[89]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[90]  Martin Loomes,et al.  Applying software metrics to formal specifications: a cognitive approach , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[91]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[92]  Ritu Agarwal,et al.  Cognitive Fit in Requirements Modeling: A Study of Object and Process Methodologies , 1996, J. Manag. Inf. Syst..

[93]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[94]  Adam A. Porter,et al.  Comparing Detection Methods for Software Requirements Inspections: A Replicated Experiment , 1995, IEEE Trans. Software Eng..

[95]  Stella Vosniadou,et al.  Similarity and analogical reasoning: a synthesis , 1989 .

[96]  Anna Fahlgren,et al.  Experimental Methods , 2019, Aerated Foods.

[97]  Audris Mockus,et al.  Understanding the sources of variation in software inspections , 1998, TSEM.

[98]  D. Mook,et al.  In defense of external invalidity. , 1983 .

[99]  James Miller,et al.  An empirical evaluation of defect detection techniques , 1997, Inf. Softw. Technol..

[100]  Venkataraman Ramesh,et al.  Research in software engineering: an analysis of the literature , 2002, Inf. Softw. Technol..

[101]  Elizabeth C. Hirschman,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[102]  Alan C. Benander,et al.  Recursion vs. Iteration: An Empirical Study of Comprehension , 1996, J. Syst. Softw..

[103]  James Miller,et al.  Replicating software engineering experiments: a poisoned chalice or the Holy Grail , 2005, Inf. Softw. Technol..

[104]  Andrew S. Glassner,et al.  Upon Reflection , 1999, Science.

[105]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[106]  Michelle Cartwright,et al.  An empirical view of inheritance , 1998, Inf. Softw. Technol..

[107]  D. Kahneman,et al.  Anomalies: The Endowment Effect, Loss Aversion, and Status Quo Bias , 1991 .