Benchmarking Kappa: Interrater Agreement in Software Process Assessments
暂无分享,去创建一个
[1] A. Ehrenberg,et al. The Design of Replicated Studies , 1993 .
[2] Hoi K. Suen,et al. Effects of the use of percentage agreement on behavioral observation reliabilities: A reassessment , 1985 .
[3] Jacob Cohen. Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.
[4] R. Alpert,et al. Communications Through Limited-Response Questioning , 1954 .
[5] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[6] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[7] Khaled El Emam,et al. Spice: The Theory and Practice of Software Process Improvement and Capability Determination , 1997 .
[8] Lionel C. Briand,et al. Assessor agreement in rating SPICE processes , 1996, Softw. Process. Improv. Pract..
[9] Khaled El Emam,et al. The repeatability of code defect classifications , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).
[10] B. Kotkov,et al. Test scores and what they mean , 1963 .
[11] D P Hartmann,et al. Considerations in the choice of interobserver reliability estimates. , 1977, Journal of applied behavior analysis.
[12] Bob Smith,et al. Evaluating the interrater agreement of process capability ratings , 1997, Proceedings Fourth International Software Metrics Symposium.
[13] Khaled El Emam,et al. The Internal Consistency of the ISO/IEC 15504 Software Process Capability Scale , 1998, IEEE METRICS.
[14] Pasi Kuvaja,et al. Bootstrap 3.0 — Software Process Assessment Methodology , 1998 .
[15] Squires Bp,et al. Statistics in biomedical manuscripts: what editors want from authors and peer reviewers , 1989 .
[16] S. Jaggi. TESTS OF SIGNIFICANCE , 2003 .
[17] Khaled El Emam,et al. Cost implications of interrater agreement for software process assessments , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).
[18] James Joseph Biundo,et al. Analysis of Contingency Tables , 1969 .
[19] Khaled El Emam,et al. The reliability of ISO/IEC PDTR 15504 assessments , 1997, Softw. Process. Improv. Pract..
[20] Dennis R. Goldenson,et al. SPICE: an empiricist's perspective , 1995, Proceedings of Software Engineering Standards Symposium.
[21] Robert C. Camp,et al. Benchmarking: The Search for Industry Best Practices That Lead to Superior Performance , 1989 .
[22] Douglas G. Altman,et al. Practical statistics for medical research , 1990 .
[23] Dennis R. Goldenson,et al. Interrater agreement in SPICE-based assessments: some preliminary results , 1996, Proceedings of Software Process 1996.
[24] R. Zwick,et al. Another look at interrater agreement. , 1988, Psychological bulletin.
[25] H. Lyman. Test Scores and What They Mean , 1971 .
[26] W. A. Scott,et al. Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .
[27] B. Everitt,et al. Statistical methods for rates and proportions , 1973 .
[28] Jacob Cohen,et al. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .
[29] Khaled El Emam,et al. An Empirical Evaluation of the Prospective International SPICE Standard , 1996, Softw. Process. Improv. Pract..
[30] Jacob Cohen,et al. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .
[31] B. Squires. Statistics in biomedical manuscripts: what editors want from authors and peer reviewers. , 1990, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.
[32] Bob Smith,et al. The Internal Consistencies of the 1987 SEI Maturity Questionnaire and the SPICE Capability Dimension , 1998, Empirical Software Engineering.
[33] Lionel C. Briand,et al. Using simulation to build inspection efficiency benchmarks for development projects , 1998, Proceedings of the 20th International Conference on Software Engineering.
[34] Anne Lohrli. Chapman and Hall , 1985 .
[35] Domenic V. Cicchetti. A new measure of agreement between rank ordered variables. , 1972 .
[36] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[37] R. Peterson,et al. Interjudge Agreement and the Maximum Value of Kappa , 1989 .
[38] Bob Smith,et al. Modelling the reliability of SPICE based assessments , 1997, Proceedings of IEEE International Symposium on Software Engineering Standards.
[39] B. Everitt,et al. Large sample standard errors of kappa and weighted kappa. , 1969 .
[40] Hans Zeisel. The Significance of Insignificant Differences , 1955 .