Using (Bio)Metrics to Predict Code Quality Online

Finding and fixing code quality concerns, such as defects or poor understandability of code, decreases software development and evolution costs. A common industrial practice to identify code quality concerns early on are code reviews. While code reviews help to identify problems early on, they also impose costs on development and only take place after a code change is already completed. The goal of our research is to automatically identify code quality concerns while a developer is making a change to the code. By using biometrics, such as heart rate variability, we aim to determine the difficulty a developer experiences working on a part of the code as well as identify and help to fix code quality concerns before they are even committed to the repository. In a field study with ten professional developers over a two-week period we investigated the use of biometrics to determine code quality concerns. Our results show that biometrics are indeed able to predict quality concerns of parts of the code while a developer is working on, improving upon a naive classifier by more than 26% and outperforming classifiers based on more traditional metrics. In a second study with five professional developers from a different country and company, we found evidence that some of our findings from our initial study can be replicated. Overall, the results from the presented studies suggest that biometrics have the potential to predict code quality concerns online and thus lower development and evolution costs.

[1]  Gabriele Bavota,et al.  Detecting bad smells in source code using change history information , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Robert B. Grady,et al.  Key lessons in achieving widespread inspection use , 1994, IEEE Software.

[3]  Houari Sahraoui,et al.  Generic Metric Extraction Framework , 2006 .

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Thomas Fritz,et al.  Stuck and Frustrated or in Flow and Happy: Sensing Developers' Emotions and Progress , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[6]  Andy M. Connor Mining Software Metrics from the Jazz Repository , 2011 .

[7]  Martha E. Crosby,et al.  How do we read algorithms? A case study , 1990, Computer.

[8]  Rebecca A. Weast,et al.  The Effect of Cognitive Load and Meaning on Selective Attention , 2010 .

[9]  John Sweller,et al.  Cognitive Load During Problem Solving: Effects on Learning , 1988, Cogn. Sci..

[10]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[11]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[12]  Prasun Dewan,et al.  Are you having difficulty? , 2010, CSCW '10.

[13]  Robert Riener,et al.  The Role of Serious Games in Robot Exoskeleton-Assisted Rehabilitation of Stroke Patients , 2015 .

[14]  Sven Apel,et al.  Exploring Software Measures to Assess Program Comprehension , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[15]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16]  Michele Lanza,et al.  Object-Oriented Metrics in Practice - Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems , 2006 .

[17]  BryantA.,et al.  B. W. Boehm software engineering economics , 1983 .

[18]  Yanjun Qi Random Forest for Bioinformatics , 2012 .

[19]  Yann-Gaël Guéhéneuc,et al.  DECOR: A Method for the Specification and Detection of Code and Design Smells , 2010, IEEE Transactions on Software Engineering.

[20]  Leon Moonen,et al.  Java quality assurance by detecting code smells , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[21]  Daniel M. Germán,et al.  Quantifying programmers' mental workload during program comprehension based on cerebral blood flow measurement: a controlled experiment , 2014, ICSE Companion.

[22]  Martin Pinzger,et al.  Method-level bug prediction , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[23]  Andrew Begel,et al.  Using psycho-physiological measures to assess task difficulty in software development , 2014, ICSE.

[24]  Radu Marinescu,et al.  Detection strategies: metrics-based rules for detecting design flaws , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[25]  Roman Bednarik,et al.  What do you want to do next: a novel approach for intent prediction in gaze-based interaction , 2012, ETRA.

[26]  Witold Pedrycz,et al.  Analysis of the reliability of a subset of change metrics for defect prediction , 2008, ESEM '08.

[27]  Priscilla J. Fowler,et al.  Software inspections and the industrial production of software , 1984 .

[28]  M. Munih,et al.  Psychophysiological Responses to Robotic Rehabilitation Tasks in Stroke , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[29]  Hidetake Uwano,et al.  Brain activity measurement during program comprehension with NIRS , 2014, 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[30]  Gerhard Tröster,et al.  Discriminating Stress From Cognitive Load Using a Wearable EDA Device , 2010, IEEE Transactions on Information Technology in Biomedicine.

[31]  Ming Gu,et al.  Predicting Defective Software Components from Code Complexity Measures , 2007, 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007).

[32]  Hoh Peter In,et al.  Micro interaction metrics for defect prediction , 2011, ESEC/FSE '11.

[33]  Paul Ayres Systematic Mathematical Errors and Cognitive Load. , 2001, Contemporary educational psychology.

[34]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[35]  R Heger,et al.  Psychophysiological analysis of mental load during driving on rural roads--a quasi-experimental field study. , 1998, Ergonomics.

[36]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[37]  M.J. Munro,et al.  Product Metrics for Automatic Identification of "Bad Smell" Design Problems in Java Source-Code , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[38]  Stefan Schmidt,et al.  Electrodermal Activity (Eda) -- State-of-the-Art Measurement and Techniques for Parapsychological Purposes , 1999 .

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Elaine J. Weyuker,et al.  Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models , 2008, Empirical Software Engineering.

[41]  Louise Venables,et al.  The influence of task demand and learning on the psychophysiological response. , 2005, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[42]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[43]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[44]  Thomas Leich,et al.  Understanding understanding source code with functional magnetic resonance imaging , 2014, ICSE.

[45]  Markku Tukiainen,et al.  An eye-tracking methodology for characterizing program comprehension processes , 2006, ETRA.

[46]  Glenn F. Wilson,et al.  An Analysis of Mental Workload in Pilots During Flight Using Multiple Psychophysiological Measures , 2002 .

[47]  Everett Waters,et al.  HEART RATE AS A CONVERGENT MEASURE IN CLINICAL AND DEVELOPMENTAL RESEARCH , 1977 .

[48]  Barry W. Boehm,et al.  Quantitative evaluation of software quality , 1976, ICSE '76.

[49]  Andrew Sears,et al.  Gesture Dynamics : Features Sensitive to Task Difficulty and Correlated with Physiological Sensors , 2011 .

[50]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[51]  Meir M. Lehman,et al.  On understanding laws, evolution, and conservation in the large-program life cycle , 1984, J. Syst. Softw..

[52]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[53]  Andrew M. Kuhn,et al.  Code Complete , 2005, Technometrics.

[54]  H. Nagaraja,et al.  Heart rate variability: origins, methods, and interpretive caveats. , 1997, Psychophysiology.

[55]  Chris Parnin,et al.  Subvocalization - Toward Hearing the Inner Thoughts of Developers , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[56]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[57]  Brad A. Myers,et al.  A framework and methodology for studying the causes of software errors in programming systems , 2005, J. Vis. Lang. Comput..

[58]  Michael A. Riley,et al.  Effect of precision aiming on respiration and the postural-respiratory synergy , 2011, Neuroscience Letters.

[59]  Ken-ichi Matsumoto,et al.  Real-Time Monitoring of Neural State in Assessing and Improving Software Developers' Productivity , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.

[60]  Ward Cunningham,et al.  The WyCash portfolio management system , 1992, OOPSLA '92.

[61]  S. Porges,et al.  Heart rate and respiratory responses as a function of task difficulty: the use of discriminant analysis in the selection of psychologically sensitive physiological responses. , 1976, Psychophysiology.

[62]  Thomas Fritz,et al.  Tracing software developers' eyes and interactions for change tasks , 2015, ESEC/SIGSOFT FSE.

[63]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[64]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[65]  Christian Bird,et al.  Characteristics of Useful Code Reviews: An Empirical Study at Microsoft , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[66]  Robert G. Ebenau,et al.  Software Inspection Process , 1993 .

[67]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[68]  J. Veltman,et al.  Physiological workload reactions to increasing levels of task difficulty. , 1998, Ergonomics.

[69]  John Sweller,et al.  Cognitive Load Theory , 2020, Encyclopedia of Education and Information Technologies.

[70]  Bill Curtis,et al.  Measuring the Psychological Complexity of Software Maintenance Tasks with the Halstead and McCabe Metrics , 1979, IEEE Transactions on Software Engineering.