The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics Influences Code Understanding

Static code analysis tools and integrated development environments present developers with quality-related software metrics, some of which describe the understandability of source code. Software metrics influence overarching strategic decisions that impact the future of companies and the prioritization of everyday software development tasks. Several software metrics, however, lack in validation: we just choose to trust that they reflect what they are supposed to measure. Some of them were even shown to not measure the quality aspects they intend to measure. Yet, they influence us through biases in our cognitive-driven actions. In particular, they might anchor us in our decisions. Whether the anchoring effect exists with software metrics has not been studied yet. We conducted a randomized and double-blind experiment to investigate the extent to which a displayed metric value for source code comprehensibility anchors developers in their subjective rating of source code comprehensibility, whether performance is affected by the anchoring effect when working on comprehension tasks, and which individual characteristics might play a role in the anchoring effect. We found that the displayed value of a comprehensibility metric has a significant and large anchoring effect on a developer's code comprehensibility rating. The effect does not seem to affect the time or correctness when working on comprehension questions related to the code snippets under study. Since the anchoring effect is one of the most robust cognitive biases, and we have limited understanding of the consequences of the demonstrated manipulation of developers by non-validated metrics, we call for an increased awareness of the responsibility in code quality reporting and for corresponding tools to be based on scientific evidence.

[1]  Veljko Jovanović,et al.  Beyond the PANAS: Incremental validity of the Scale of Positive and Negative Experience (SPANE) in relation to well-being , 2015 .

[2]  Yong Wang,et al.  The Scale of Positive and Negative Experience (SPANE): Psychometric Properties and Normative Data in a Large Chinese Sample , 2013, PloS one.

[3]  Katsunori Sumi,et al.  Reliability and Validity of Japanese Versions of the Flourishing Scale and the Scale of Positive and Negative Experience , 2013, Social Indicators Research.

[4]  Jeffrey Parsons,et al.  A Little Help can Be A Bad Thing: Anchoring and Adjustment in Adaptive Query Reuse , 2006, ICIS.

[5]  Gabriele Bavota,et al.  Automatically Assessing Code Understandability , 2019, IEEE Transactions on Software Engineering.

[6]  António Caetano,et al.  Validation of the Flourishing Scale and Scale of Positive and Negative Experience in Portugal , 2011, Social Indicators Research.

[7]  Guadalupe Molinari,et al.  Assessing positive and negative experiences: validation of a new measure of well-being in an Italian population. , 2016, Rivista di psichiatria.

[8]  Elke Heise,et al.  Measuring the frequency of emotions—validation of the Scale of Positive and Negative Experience (SPANE) in Germany , 2017, PloS one.

[9]  Zhenchang Xing,et al.  Measuring Program Comprehension: A Large-Scale Field Study with Professionals , 2018, IEEE Transactions on Software Engineering.

[10]  Kent L. Beck,et al.  Extreme programming explained - embrace change , 1990 .

[11]  Jana Schumann,et al.  Confounding parameters on program comprehension: a literature survey , 2015, Empirical Software Engineering.

[12]  Dietmar Pfahl,et al.  Reporting Experiments in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[13]  Stephanie L. Fowler,et al.  Dispositional optimism predicts placebo analgesia. , 2010, The journal of pain : official journal of the American Pain Society.

[14]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[15]  Wael El-Deredy,et al.  Reproducibility of placebo analgesia: Effect of dispositional optimism , 2009, PAIN.

[16]  Chu Kim-prieto,et al.  New Well-being Measures: Short Scales to Assess Flourishing and Positive and Negative Feelings , 2010 .

[17]  Michael W. Bridges,et al.  Distinguishing optimism from neuroticism (and trait anxiety, self-mastery, and self-esteem): a reevaluation of the Life Orientation Test. , 1994, Journal of personality and social psychology.

[18]  Attila Szabo,et al.  Placebo effects in sport and exercise: A meta-analysis , 2011 .

[19]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[20]  J. M. Digman PERSONALITY STRUCTURE: EMERGENCE OF THE FIVE-FACTOR MODEL , 1990 .

[21]  Janet Siegmund,et al.  Shorter identifier names take longer to comprehend , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[22]  Todd McElroy,et al.  Susceptibility to anchoring effects: How openness-to-experience influences responses to anchoring cues , 2007, Judgment and Decision Making.

[23]  Birte Englich,et al.  Moody experts — How mood and expertise influence judgmental anchoring , 2009 .

[24]  Tor D. Wager,et al.  The neuroscience of placebo effects: connecting context, learning and health , 2015, Nature Reviews Neuroscience.

[25]  E. Langer,et al.  Mind-Set Matters , 2007, Psychological science.

[26]  M. Hilbert,et al.  Toward a synthesis of cognitive biases: how noisy information processing can bias human decision making. , 2012, Psychological bulletin.

[27]  Mohammad Alshayeb,et al.  Empirical investigation of refactoring effect on software quality , 2009, Inf. Softw. Technol..

[28]  Jacqueline Murray Likert Data: What to Use, Parametric or Non-Parametric? , 2013 .

[29]  Fernando Castor Filho,et al.  Evaluating Code Readability and Legibility: An Examination of Human-centric Studies , 2020, 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[30]  Robert Feldt,et al.  Behavioral software engineering: A definition and systematic literature review , 2015, J. Syst. Softw..

[31]  Austen Rainer,et al.  Persuading developers to "buy into" software process improvement: a local opinion and empirical evidence , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[32]  Pekka Abrahamsson,et al.  On the Unhappiness of Software Developers , 2017, EASE.

[33]  Jon-Kar Zubieta,et al.  Personality Trait Predictors of Placebo Analgesia and Neurobiological Correlates , 2013, Neuropsychopharmacology.

[34]  Thomas Gilovich,et al.  Incidental environmental anchors , 2008 .

[35]  Thomas Mussweiler,et al.  Subliminal anchoring: Judgmental consequences and underlying mechanisms , 2005 .

[36]  Yuval Hart,et al.  Placebo can enhance creativity , 2017, PloS one.

[37]  F. Lang,et al.  Testgüte und psychometrische Äquivalenz der deutschen Version des Big Five Inventory (BFI) bei jungen, mittelalten und alten Erwachsenen , 2001 .

[38]  Michele Lanza,et al.  I know what you did last summer: an investigation of how developers spend their time , 2015, ICPC '15.

[39]  O. John,et al.  Los Cinco Grandes across cultures and ethnic groups: multitrait multimethod analyses of the Big Five in Spanish and English. , 1998, Journal of personality and social psychology.

[40]  Christina Draganich,et al.  Placebo sleep affects cognitive functioning. , 2014, Journal of experimental psychology. Learning, memory, and cognition.

[41]  O. John,et al.  Los Cinco Grandes across cultures and ethnic groups: multitrait multimethod analyses of the Big Five in Spanish and English. , 1998, Journal of personality and social psychology.

[42]  Stefan Wagner,et al.  Open Science in Software Engineering , 2019, Contemporary Empirical Methods in Software Engineering.

[43]  Tore Dybå,et al.  Conducting realistic experiments in software engineering , 2002, Proceedings International Symposium on Empirical Software Engineering.

[44]  G. Ann Campbell,et al.  Cognitive Complexity — An Overview and Evaluation , 2018, 2018 IEEE/ACM International Conference on Technical Debt (TechDebt).

[45]  Sven Apel,et al.  Simultaneous measurement of program comprehension with fMRI and eye tracking: a case study , 2018, ESEM.

[46]  Pekka Abrahamsson,et al.  Happy software developers solve problems better: psychological measurements in empirical software engineering , 2014, PeerJ.

[47]  Christoph J. Stettina,et al.  Don't Forget to Breathe: A Controlled Trial of Mindfulness Practices in Agile Project Teams , 2017, XP.

[48]  Lucas Gren,et al.  Do internal software quality tools measure validated metrics? , 2019, PROFES.

[49]  Jim Buckley,et al.  Expectation-based, inference-based, and bottom-up software comprehension , 2004, J. Softw. Maintenance Res. Pract..

[50]  Barry W. Boehm,et al.  Quantitative evaluation of software quality , 1976, ICSE '76.

[51]  Walter Paulus,et al.  Evidence for Cognitive Placebo and Nocebo Effects in Healthy Individuals , 2018, Scientific Reports.

[52]  Paul Ralph,et al.  Sampling in Software Engineering Research: A Critical Review and Guidelines , 2020, ArXiv.

[53]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[54]  Paul Ralph,et al.  Cognitive Biases in Software Engineering: A Systematic Mapping Study , 2017, IEEE Transactions on Software Engineering.

[55]  Stefan Wagner,et al.  An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability , 2020, ESEM.

[56]  R. McCrae,et al.  An introduction to the five-factor model and its applications. , 1992, Journal of personality.

[57]  Per Runeson,et al.  Four commentaries on the use of students and professionals in empirical software engineering experiments , 2018, Empirical Software Engineering.

[58]  A. Furnham,et al.  A literature review of the anchoring effect , 2011 .

[59]  Sarah Fakhoury Moving towards objective measures of program comprehension , 2018, ESEC/SIGSOFT FSE.

[60]  Robert Feldt,et al.  Psychometrics in Behavioral Software Engineering: A Methodological Introduction with Guidelines , 2022, ACM Transactions on Software Engineering and Methodology.

[61]  G. Bodenhausen,et al.  Sadness and Susceptibility to Judgmental Bias: The Case of Anchoring , 2000, Psychological science.

[62]  Jerffeson Souza,et al.  On the Placebo Effect in Interactive SBSE: A Preliminary Study , 2018, SSBSE.

[63]  Arthur K. Shapiro,et al.  Semantics of the placebo , 2005, Psychiatric Quarterly.

[64]  Keely L. Croxton,et al.  Biases in judgmental adjustments of statistical forecasts: The role of individual differences , 2010 .

[65]  Natalia Juristo Juzgado,et al.  Are Students Representatives of Professionals in Software Engineering Experiments? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[66]  Tore Dybå,et al.  Evidence-based software engineering , 2004, Proceedings. 26th International Conference on Software Engineering.

[67]  Jürgen Hoyer,et al.  Die deutsche Version des Life-Orientation-Tests (LOT-R) zum dispositionellen Optimismus und Pessimismus , 2008 .

[68]  W. Mackenzie,et al.  The Management and the Worker , 2008 .

[69]  Premkumar T. Devanbu,et al.  Belief & Evidence in Empirical Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[70]  Anneliese Amschler Andrews,et al.  Program Comprehension During Software Maintenance and Evolution , 1995, Computer.

[71]  S. Iliffe,et al.  Bmc Medical Research Methodology Open Access the Hawthorne Effect: a Randomised, Controlled Trial , 2007 .

[72]  Nils Christian Haugen An empirical study of using planning poker for user story estimation , 2006, AGILE 2006 (AGILE'06).

[73]  Walter F. Tichy,et al.  Hints for Reviewing Empirical Work in Software Engineering , 2000, Empirical Software Engineering.

[74]  Yu Yan,et al.  Understanding misunderstandings in source code , 2017, ESEC/SIGSOFT FSE.