The Class Imbalance Problem in TLC Image Classification

The paper presents the methodology developed to solve the class imbalanced problem that occurs in the classification of Thin-Layer Chromatography (TLC) images. The proposed methodology is based on re-sampling, and consists in the undersampling of the majority class (normal class), while the minority classes, which contain Lysosomal Storage Disorders (LSD) samples, are oversampled with the generation of synthetic samples. For image classification two approaches are presented, one based on a hierarchical classifier and another uses a multiclassifier system, where both classifiers are trained and tested using balanced data sets. The results demonstrate a better performance of the multiclassifier system using the balanced sets.

[1]  Rui L. Aguiar,et al.  Feature Extraction for Classification of Thin-Layer Chromatography Images , 2005, ICIAR.

[2]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[3]  Sauchi Stephen Lee Noisy replication in skewed binary classification , 2000 .

[4]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[6]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[7]  Rui L. Aguiar,et al.  Automatic Lane and Band Detection in Images of Thin Layer Chromatography , 2004, ICIAR.

[8]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[9]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[10]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[11]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[12]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[15]  Horst Bunke,et al.  Off-Line, Handwritten Numeral Recognition by Perturbation Method , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Robert P. W. Duin,et al.  K-nearest Neighbors Directed Noise Injection in Multilayer Perceptron Training , 2000, IEEE Trans. Neural Networks Learn. Syst..