On extending F-measure and G-mean metrics to multi-class problems

The evaluation of classifiers is not an easy task. There are various ways of testing them and measures to estimate their performance. The great majority of these measures were defined for two-class problems and there is not a consensus about how to generalize them to multiclass problems. This paper proposes the extension of the F-measure and G-mean in the same fashion as carried out with the AUC. Some datasets with diverse characteristics are used to generate fuzzy classifiers and C4.5 trees. The most common evaluation metrics are implemented and they are compared in terms of their output values: the greater the response the more optimistic the measure. The results suggest that there are two well-behaved measures in opposite roles: one is always optimistic and the other always pessimistic.