论文信息 - Classification of unbalanced data with transparent kernels

Classification of unbalanced data with transparent kernels

Two important issues regarding data driven classification are addressed here: model interpretation and unbalanced data. The aim is to build data driven classifiers that provide good predictive performance for a set of unbalanced data and enhance the understanding of a model by enabling input/output dependencies that exist to be visualised. The classification method is demonstrated on an unbalanced data set that describes fatigue crack initiation in automotive camshafts. To generate interpretable models, the support vector parsimonious analysis of variance technique is extended to the classification domain. The technique enables an additive decomposition of low dimensional kernel models to be recovered, enhancing model visualization. The standard averaging technique used to assess the performance of the model is inappropriate for unbalanced data. The geometric mean is used. These resulting components had low dimensions, and consequently can be visualized.

C.J. Harris | K.K. Lee | S.R. Gunn | P.A.S. Reed

[1] Tim Oates,et al. Efficient progressive sampling , 1999, KDD '99.

[2] Tom Fawcett,et al. Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[3] Chris J. Harris,et al. Regression models for classification to enhance interpretability , 2001 .

[4] Chris J. Harris,et al. Approaches to imbalanced data for classification: a case study , 2001 .

[5] Philippa A.S. Reed,et al. Effect of graphite nodule distribution on crack initiation and early growth in austempered ductile iron , 1999 .

[6] G. Wahba. Spline models for observational data , 1990 .

[7] Grace Wahba,et al. Spline Models for Observational Data , 1990 .

[8] Nello Cristianini,et al. Controlling the Sensitivity of Support Vector Machines , 1999 .

[9] Boselli,et al. Secondary phase distribution analysis via finite body tessellation , 1999, Journal of microscopy.

[10] Martin Brown,et al. SUPANOVA: a sparse, transparent modelling approach , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[11] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.