The confusion matrix is the standard way to report on the thematic accuracy of geographic data (spatial databases, topographic maps, thematic maps, classified images, remote sensing products, etc.). Two widely adopted indices for the assessment of thematic quality are derived from the confusion matrix. They are overall accuracy (OA) and the Kappa coefficient (ĸ), which have received some criticism from some authors. Both can be used to test the similarity of two independent classifications by means of a simple statistical hypothesis test, which is the usual practice. Nevertheless, this is not recommended, because different combinations of cell values in the matrix can obtain the same value of OA or ĸ, due to the aggregation of data needed to compute these indices. Thus, not rejecting a test for equality between two index values does not necessarily mean that the two matrices are similar. Therefore, we present a new statistical tool to evaluate the similarity between two confusion matrices. It takes into account that the number of sample units correctly and incorrectly classified can be modeled by means of a multinomial distribution. Thus, it uses the individual cell values in the matrices and not aggregated information, such as the OA or ĸ values. For this purpose, it is considered a test function based on the discrete squared Hellinger distance, which is a measure of similarity between probability distributions. Given that the asymptotic approximation of the null distribution of the test statistic is rather poor for small and moderate sample sizes, we used a bootstrap estimator. To explore how the p-value evolves, we applied the proposed method over several predefined matrices which are perturbed in a specified range. Finally, a complete numerical example of the comparison of two matrices is presented.
[1]
Stephen V. Stehman,et al.
Selecting and interpreting measures of thematic classification accuracy
,
1997
.
[2]
Russell G. Congalton,et al.
A review of assessing the accuracy of classifications of remotely sensed data
,
1991
.
[3]
María-Dolores Jiménez-Gamero,et al.
Bootstrapping divergence statistics for testing homogeneity in multinomial populations
,
2009,
Math. Comput. Simul..
[4]
M. Jiménez-Gamero,et al.
Bootstrap estimation of the distribution of Matusita distance in the mixed case
,
2005
.
[5]
K. Zografos,et al.
f-Dissimilarity of Several Distributions in Testing Statistical Hypotheses
,
1998
.
[6]
M. Jiménez-Gamero,et al.
Fourier methods for model selection
,
2016
.
[7]
Ryuei Nishii,et al.
Accuracy and inaccuracy assessments in land-cover classification
,
1999,
IEEE Trans. Geosci. Remote. Sens..
[8]
A. Conde,et al.
Scaling the chord and Hellinger distances in the range [0,1]: An option to consider
,
2018
.
[9]
R. Pontius,et al.
Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment
,
2011
.
[10]
Jacob Cohen.
A Coefficient of Agreement for Nominal Scales
,
1960
.
[11]
Rafael Pino-Mejías,et al.
Minimum ϕ-divergence estimation in misspecified multinomial models
,
2011,
Comput. Stat. Data Anal..
[12]
Steffen Fritz,et al.
An Exploration of Some Pitfalls of Thematic Map Assessment Using the New Map Tools Resource
,
2018,
Remote. Sens..