Classification accuracy comparison: hypothesis tests and the use of confidence intervals in evaluations of difference, equivalence and non-inferiority

[1]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[2]  R. G. Oderwald,et al.  Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques. , 1983 .

[3]  G. H. Rosenfield,et al.  A coefficient of agreement as a measure of thematic classification accuracy. , 1986 .

[4]  S. A. Briggs,et al.  Fast maximum likelihood classification of remotely-sensed imagery , 1987 .

[5]  Chris Lloyd Confidence Intervals for the Difference Between Two Correlated Proportions , 2021 .

[6]  L E Daly,et al.  Confidence intervals and sample sizes: don't throw out all your old sample size tables. , 1991, BMJ.

[7]  D G Altman,et al.  Statistics notes: Absence of evidence is not evidence of absence , 1995 .

[8]  D M Clarke,et al.  Comparing correlated kappas by resampling: is one level of agreement significantly different from another? , 1996, Journal of psychiatric research.

[9]  Len Thomas,et al.  Retrospective Power Analysis , 1997 .

[10]  J M Nam,et al.  Establishing equivalence of two treatments and sample size requirements in matched-pairs design. , 1997, Biometrics.

[11]  Stephen V. Stehman,et al.  Selecting and interpreting measures of thematic classification accuracy , 1997 .

[12]  R G Newcombe,et al.  Improved confidence intervals for the difference between binomial proportions based on paired data. , 1998, Statistics in medicine.

[13]  Sample size and power calculations for comparing two independent proportions in a `negative' trial , 1998, Psychiatry Research.

[14]  T Tango,et al.  Equivalence test and confidence interval for the difference in proportions for the paired-sample design. , 1997, Statistics in medicine.

[15]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.

[16]  T Tango,et al.  Re: Improved confidence intervals for the difference between binomial proportions based on paired data by Robert G. Newcombe, Statistics in Medicine, 17, 2635-2650 (1998) , 1999, Statistics in medicine.

[17]  A Donner,et al.  Testing the equality of two dependent kappa statistics. , 2000, Statistics in medicine.

[18]  Stephen V. Stehman,et al.  Practical Implications of Design-Based Sampling Inference for Thematic Map Accuracy Assessment , 2000 .

[19]  C. Woodcock,et al.  Classification and Change Detection Using Landsat TM Data: When and How to Correct Atmospheric Effects? , 2001 .

[20]  D. Heisey,et al.  The Abuse of Power , 2001 .

[21]  L. Barker,et al.  Assessing equivalence: an alternative to the use of difference tests for measuring disparities in vaccination coverage. , 2002, American journal of epidemiology.

[22]  Chris Aberson Interpreting Null Results: Improving Presentation and Conclusions with Confidence Intervals 1 , 2002 .

[23]  John B. Carlin,et al.  Statistics for clinicians: 7: Sample size , 2002 .

[24]  G. Ruxton,et al.  Confidence intervals are a more useful complement to nonsignificant tests than are power calculations , 2003 .

[25]  G. Foody Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy , 2004 .

[26]  Giles M. Foody,et al.  Toward intelligent training of supervised image classifications: directing training data acquisition for SVM classification , 2004 .

[27]  Julian Di Stefano,et al.  A confidence interval approach to data analysis , 2004 .

[28]  Understanding and Evaluating Research in Applied and Clinical Settings , 2005 .

[29]  A. Skidmore,et al.  Comparing accuracy assessments to infer superiority of image classification methods , 2006 .

[30]  David F Kallmes,et al.  No significant difference ... Says who? , 2007, AJNR. American journal of neuroradiology.

[31]  Statistical Thinking for Non-Statisticians in Drug Regulation: Kay/Statistical , 2007 .

[32]  Alejandro Martínez-Abraín,et al.  Are there any differences? A non-sensical question in ecology , 2007 .

[33]  Giles M. Foody,et al.  Harshness in image classification accuracy assessment , 2008 .

[34]  Giles M. Foody,et al.  Crop classification by support vector machine with intelligently selected training data for an operational application , 2008 .

[35]  John P. Kerekes,et al.  Receiver Operating Characteristic Curve Confidence Intervals and Regions , 2008, IEEE Geoscience and Remote Sensing Letters.

[36]  L. Zhang,et al.  Using a hybrid fuzzy classifier (HFC) to map typical grassland vegetation in Xilin River Basin, Inner Mongolia, China , 2008 .

[37]  Jay Gao,et al.  Mapping of land degradation from space: a comparative study of Landsat ETM+ and ASTER data , 2008 .

[38]  Yang Wang,et al.  Feature‐selection ability of the decision‐tree algorithm and the impact of feature‐selection/extraction on decision‐tree results based on hyperspectral data , 2008, International Journal of Remote Sensing.

[39]  S. Ertürk,et al.  Phase correlation based redundancy removal in feature weighting band selection for hyperspectral images , 2008 .

[40]  Thomas Alexandridis,et al.  A novel self‐organizing neuro‐fuzzy multilayered classifier for land cover classification of a VHR image , 2008 .

[41]  Michael A. Wulder,et al.  Landsat continuity: Issues and opportunities for land cover monitoring , 2008 .

[42]  Ryutaro Tateishi,et al.  Comparison of a new classifier, the Mix–Unmix Classifier, with conventional hard and soft classifiers , 2008 .

[43]  Giles M. Foody,et al.  Sample size determination for image classification accuracy assessment and comparison , 2009 .

[44]  S. Bahna,et al.  Statistics for Clinicians , 2009 .