Classification of unbalanced data with transparent kernels

Two important issues regarding data driven classification are addressed here: model interpretation and unbalanced data. The aim is to build data driven classifiers that provide good predictive performance for a set of unbalanced data and enhance the understanding of a model by enabling input/output dependencies that exist to be visualised. The classification method is demonstrated on an unbalanced data set that describes fatigue crack initiation in automotive camshafts. To generate interpretable models, the support vector parsimonious analysis of variance technique is extended to the classification domain. The technique enables an additive decomposition of low dimensional kernel models to be recovered, enhancing model visualization. The standard averaging technique used to assess the performance of the model is inappropriate for unbalanced data. The geometric mean is used. These resulting components had low dimensions, and consequently can be visualized.