Predicting Bugs by Monitoring Developers During Task Execution

Knowing which parts of the source code will be defective can allow practitioners to better allocate testing resources. For this reason, many approaches have been proposed to achieve this goal. Most state-of-the-art predictive models rely on product and process metrics, i.e., they predict the defectiveness of a component by considering what developers did. However, there is still limited evidence of the benefits that can be achieved in this context by monitoring how developers complete a development task. In this paper, we present an empirical study in which we aim at understanding whether measuring human aspects on developers while they write code can help predict the introduction of defects. First, we introduce a new developer-based model which relies on behavioral, psychophysical, and control factors that can be measured during the execution of development tasks. Then, we run a controlled experiment involving 20 software developers to understand if our developer-based model is able to predict the introduction of bugs. Our results show that a developer-based model is able to achieve a similar accuracy compared to a state-of-the-art code-based model, i.e., a model that uses only features measured from the source code. We also observed that by combining the models it is possible to obtain the best results (84% accuracy).

[1]  Heng Yin,et al.  Leveraging developer information for efficient effort-aware bug prediction , 2021, Inf. Softw. Technol..

[2]  Nicole Novielli,et al.  Emotions and Perceived Productivity of Software Developers at the Workplace , 2021, IEEE Transactions on Software Engineering.

[3]  Nicholas A. Kraft,et al.  Observing and predicting knowledge worker stress, focus and awakeness in the wild , 2021, Int. J. Hum. Comput. Stud..

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  Rudolf Ferenc,et al.  An Automatically Created Novel Bug Dataset and its Validation in Bug Prediction , 2020, J. Syst. Softw..

[6]  Elaine M. Huang,et al.  Supporting Software Developers' Focused Work on Window-Based Desktops , 2020, CHI.

[7]  Anil Kumar Tripathi,et al.  BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques , 2020, Expert Syst. Appl..

[8]  Antonio Affanni,et al.  Wireless Sensors System for Stress Detection by Means of ECG and EDA Acquisition , 2020, Sensors.

[9]  Thomas Leich,et al.  A Look into Programmers’ Heads , 2020, IEEE Transactions on Software Engineering.

[10]  Jonathan I. Maletic,et al.  Slice-Based Cognitive Complexity Metrics for Defect Prediction , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[11]  D. Fucci,et al.  Recognizing Developers' Emotions while Programming , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[12]  Kurt Schneider,et al.  Attention in Software Maintenance: An Eye Tracking Study , 2019, 2019 IEEE/ACM 6th International Workshop on Eye Movements in Programming (EMIP).

[13]  Nicole Novielli,et al.  A Replication Study on Code Comprehension and Expertise using Lightweight Biometric Sensors , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[14]  Francesca Arcelli Fontana,et al.  Toward a Smell-Aware Bug Prediction Model , 2019, IEEE Transactions on Software Engineering.

[15]  Miikka Kuutila,et al.  Time Pressure in Software Engineering: A Systematic Literature Review , 2019, Information and Software Technology.

[16]  Giuseppe Scanniello,et al.  Need for Sleep: The Impact of a Night of Sleep Deprivation on Novice Developers’ Performance , 2018, IEEE Transactions on Software Engineering.

[17]  Thomas Fritz,et al.  Sensing Interruptibility in the Office: A Field Study on the Use of Biometric and Computer Interaction Sensors , 2018, CHI.

[18]  Daniel Morariu,et al.  THE WEKA MULTILAYER PERCEPTRON CLASSIFIER , 2018 .

[19]  D. Bai,et al.  Stress and Heart Rate Variability: A Meta-Analysis and Review of the Literature , 2018, Psychiatry investigation.

[20]  Ki H. Chon,et al.  Electrodermal Activity Is Sensitive to Cognitive Stress under Water , 2018, Front. Physiol..

[21]  Pekka Abrahamsson,et al.  What happens when software developers are (un)happy , 2017, J. Syst. Softw..

[22]  Danilo P. Mandic,et al.  Resolving Ambiguities in the LF/HF Ratio: LF-HF Scatter Plots for the Categorization of Mental and Physical Stress from HRV , 2017, Front. Physiol..

[23]  Himanshu Thapliyal,et al.  A Survey of Affective Computing for Stress Detection: Evaluating technologies in stress detection for better health , 2016, IEEE Consumer Electronics Magazine.

[24]  Thomas Fritz,et al.  Using (Bio)Metrics to Predict Code Quality Online , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[25]  Luca Citi,et al.  cvxEDA: A Convex Optimization Approach to Electrodermal Activity Processing , 2016, IEEE Transactions on Biomedical Engineering.

[26]  Mika Mäntylä,et al.  Mining Valence, Arousal, and Dominance - Possibilities for Detecting Burnout and Productivity? , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[27]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[28]  Pekka Abrahamsson,et al.  How do you feel, developer? An explanatory theory of the impact of affects on programming performance , 2015, PeerJ Comput. Sci..

[29]  Thomas Fritz,et al.  Stuck and Frustrated or in Flow and Happy: Sensing Developers' Emotions and Progress , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[30]  Ken-ichi Matsumoto,et al.  Real-Time Monitoring of Neural State in Assessing and Improving Software Developers' Productivity , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.

[31]  Pekka Abrahamsson,et al.  Do feelings matter? On the correlation of affects and the self‐assessed productivity in software engineering , 2014, J. Softw. Evol. Process..

[32]  Philip S. Yu,et al.  GBC: Gradient boosting consensus model for heterogeneous data † , 2014, Stat. Anal. Data Min..

[33]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[34]  Kai Petersen,et al.  Time pressure: a controlled experiment of test case development and requirements review , 2014, ICSE.

[35]  Andrew Begel,et al.  Using psycho-physiological measures to assess task difficulty in software development , 2014, ICSE.

[36]  Pekka Abrahamsson,et al.  Software Developers, Moods, Emotions, and Performance , 2014, IEEE Software.

[37]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  Joris C Verster,et al.  The effect of stress on core and peripheral body temperature in humans , 2013, Stress.

[39]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[40]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[41]  Chris Parnin,et al.  Subvocalization - Toward Hearing the Inner Thoughts of Developers , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[42]  Lin Tan,et al.  Do time of day and developer experience affect commit bugginess? , 2011, MSR '11.

[43]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[44]  F. Basile,et al.  Heart rate variability and myocardial infarction: systematic literature review and metanalysis. , 2009, European review for medical and pharmacological sciences.

[45]  J. Herman,et al.  Neural regulation of endocrine and autonomic stress responses , 2009, Nature Reviews Neuroscience.

[46]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[47]  W. Bardwell,et al.  Effects of stress on heart rate complexity—A comparison between short-term and chronic stress , 2009, Biological Psychology.

[48]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[49]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[50]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[51]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[52]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[53]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[54]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[55]  Robert P. W. Duin,et al.  Bagging for linear classifiers , 1998, Pattern Recognit..

[56]  Jon D. Morris Observations: SAM: The Self-Assessment Manikin An Efficient Cross-Cultural Measurement Of Emotional Response 1 , 1995 .

[57]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[58]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[59]  D. Wastell,et al.  The behavioral dynamics of information system development: A stress perspective , 1993 .

[60]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[61]  J. Russell A circumplex model of affect. , 1980 .

[62]  W. Suess,et al.  The effects of psychological stress on respiration: a preliminary study of anxiety and hyperventilation. , 1980, Psychophysiology.

[63]  M A Just,et al.  A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.

[64]  G. Golub,et al.  Updating formulae and a pairwise algorithm for computing sample variances , 1979 .

[65]  Gabriele Bavota,et al.  A Developer Centered Bug Prediction Model , 2018, IEEE Transactions on Software Engineering.

[66]  Robert E. Schapire,et al.  Explaining AdaBoost , 2013, Empirical Inference.

[67]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[68]  E. Peper,et al.  Is There More to Blood Volume Pulse Than Heart Rate Variability , Respiratory Sinus Arrhythmia , and Cardiorespiratory Synchrony ? , 2007 .

[69]  Amela Karahasanovic,et al.  An Investigation into Keystroke Latency Metrics as an Indicator of Programming Performance , 2005, ACE.

[70]  L. Breiman Random Forests , 2001, Machine Learning.

[71]  R. N. Anantharaman,et al.  Development of an instrument to measure stress among software professionals: factor analytic study , 2003, SIGMIS CPR '03.

[72]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..