Tests for consistent measurement of external subjective software quality attributes

One reason that researchers may wish to demonstrate that an external software quality attribute can be measured consistently is so that they can validate a prediction system for the attribute. However, attempts at validating prediction systems for external subjective quality attributes have tended to rely on experts indicating that the values provided by the prediction systems informally agree with the experts’ intuition about the attribute. These attempts are undertaken without a pre-defined scale on which it is known that the attribute can be measured consistently. Consequently, a valid unbiased estimate of the predictive capability of the prediction system cannot be given because the experts’ measurement process is not independent of the prediction system’s values. Usually, no justification is given for not checking to see if the experts can measure the attribute consistently. It seems to be assumed that: subjective measurement isn’t proper measurement or subjective measurement cannot be quantified or no one knows the true values of the attributes anyway and they cannot be estimated. However, even though the classification of software systems’ or software artefacts’ quality attributes is subjective, it is possible to quantify experts’ measurements in terms of conditional probabilities. It is then possible, using a statistical approach, to assess formally whether the experts’ measurements can be considered consistent. If the measurements are consistent, it is also possible to identify estimates of the true values, which are independent of the prediction system. These values can then be used to assess the predictive capability of the prediction system. In this paper we use Bayesian inference, Markov chain Monte Carlo simulation and missing data imputation to develop statistical tests for consistent measurement of subjective ordinal scale attributes.

[1]  Shari Lawrence Pfleeger,et al.  Principles of survey research part 6: data analysis , 2003, SOEN.

[2]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[3]  Motoei Azuma SquaRE The next generation of the ISO/IEC 9126 and 14598 international standards series on software product quality , 2001 .

[4]  Peter Congdon Bayesian statistical modelling , 2002 .

[5]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[6]  Malcolm Farrow,et al.  A consideration of the variation in development effort consistency due to function points , 2004 .

[7]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[8]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[9]  Stephen R. Schach,et al.  Categorization of common coupling and its application to open-source operating systems , 2004 .

[10]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[11]  H. Kyburg Theory and measurement , 1984 .

[12]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[13]  Norman E. Fenton,et al.  Measurement : A Necessary Scientific Basis , 2004 .

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  John Moses,et al.  Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data , 2004, Software Quality Journal.

[16]  Steve M. Easterbrook,et al.  Anchoring and adjustment in software estimation , 2005, ESEC/FSE-13.

[17]  Alan Agresti,et al.  Mathematical and computer modelling reports: A model for agreement between ratings on an ordinal scale , 1988 .

[18]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[19]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[20]  Jarrett Rosenberg,et al.  Problems and Prospects in Quantifying Software Maintainability , 1997, Empirical Software Engineering.

[21]  A. Agresti Categorical data analysis , 1993 .

[22]  Paul W. Oman,et al.  Using metrics to evaluate software system maintainability , 1994, Computer.

[23]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[24]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Alan M Smith,et al.  Assessment of fitness for surgical procedures , 1980 .

[26]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[27]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[28]  Maurice G. Kendall,et al.  The Advanced Theory of Statistics, Vol. 2: Inference and Relationship , 1979 .

[29]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[30]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[31]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[32]  D. Spiegelhalter,et al.  An analysis of repeated biopsies following cardiac transplantation. , 1983, Statistics in medicine.

[33]  Parag C. Pendharkar,et al.  A Probabilistic Model for Predicting Software Development Effort , 2003, ICCSA.

[34]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[35]  P J Baskett,et al.  Assessment of fitness for surgical procedures and the variability of anaesthetists' judgments. , 1980, British medical journal.

[36]  H. D. De Kanter [The philosophy of statistics]. , 1972, Ginecología y Obstetricia de México.

[37]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[38]  Tomás Aluja,et al.  Book review: Multiple correspondence analysis and related methods. Greenacre, M. and Blasius, J. Chapman & Hall/CRC, 2006. , 2006 .

[39]  Khaled El Emam,et al.  Software Cost Estimation with Incomplete Data , 2001, IEEE Trans. Software Eng..

[40]  Norman E. Fenton,et al.  A Strategy for Improving Safety Related Software Engineering Standards , 1998, IEEE Trans. Software Eng..

[41]  J. Moses A consideration of the impact of interactions with module effects on the direct measurement of subjective software attributes , 2001, Proceedings Seventh International Software Metrics Symposium.

[42]  Robert T. Hughes,et al.  Expert judgement as an estimating method , 1996, Inf. Softw. Technol..

[43]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[44]  J. Moses,et al.  Bayesian probability distributions for assessing measurement of subjective software attributes , 2000, Inf. Softw. Technol..

[45]  J. Albert Bayesian Estimation of Normal Ogive Item Response Curves Using Gibbs Sampling , 1992 .

[46]  John Moses,et al.  Benchmarking quality measurement , 2007, Software Quality Journal.

[47]  Peter Congdon,et al.  Wiley Series in Probability and Statistics , 2006 .

[48]  D. Lindley The Philosophy of Statistics , 2000 .

[49]  Martin Shepperd,et al.  Early life-cycle metrics and software quality models. , 1990 .

[50]  G. Domhoff New Directions in the Study of Dream Content Using the Hall and Van de Castle Coding System , 1999 .

[51]  D. Altman,et al.  Statistics notes: Cronbach's alpha , 1997 .

[52]  Tom DeMarco,et al.  Controlling Software Projects , 1982 .

[53]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[54]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[55]  KitchenhamBarbara,et al.  Principles of survey research part 6 , 2002 .