An Evaluation of the Robustness of MTS for Imbalanced Data

In classification problems, the class imbalance problem will cause a bias on the training of classifiers and will result in the lower sensitivity of detecting the minority class examples. The Mahalanobis-Taguchi System (MTS) is a diagnostic and forecasting technique for multivariate data. MTS establishes a classifier by constructing a continuous measurement scale rather than directly learning from the training set. Therefore, it is expected that the construction of an MTS model will not be influenced by data distribution, and this property is helpful to overcome the class imbalance problem. To verify the robustness of MTS for imbalanced data, this study compares MTS with several popular classification techniques. The results indicate that MTS is the most robust technique to deal with the classification problem on imbalanced data. In addition, this study develops a "probabilistic thresholding method" to determine the classification threshold for MTS, and it obtains a good performance. Finally, MTS is employed to analyze the radio frequency (RF) inspection process of mobile phone manufacturing. The data collected from the RF inspection process is typically an imbalanced type. Implementation results show that the inspection attributes are significantly reduced and that the RF inspection process can also maintain high inspection accuracy.

[1]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[2]  Parag C. Pendharkar,et al.  Association, statistical, mathematical and neural approaches for mining breast cancer patterns , 1999 .

[3]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[4]  Subir Chowdhury,et al.  The Mahalanobis-taguchi System , 2000 .

[5]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[8]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Wei Jiang,et al.  The Mahalanobis–Taguchi Strategy , 2003, Technometrics.

[10]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[11]  Seoung Bum Kim,et al.  A Review and Analysis of the Mahalanobis—Taguchi System , 2003, Technometrics.

[12]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[13]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[14]  Tomaso A. Poggio,et al.  Image Representations and Feature Selection for Multimedia Database Search , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Two Approaches to Data Mining from Imbalanced Data , 2004, J. Intell. Manuf..

[16]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[17]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[18]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[19]  Michael R. Lyu,et al.  Learning classifiers from imbalanced data based on biased minimax probability machine , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[21]  Jaideep Srivastava,et al.  Blocking reduction strategies in hierarchical text classification , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[23]  T. Riho,et al.  The yield enhancement methodology for invisible defects using the MTS+ method , 2005, IEEE Transactions on Semiconductor Manufacturing.

[24]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Long-Sheng Chen,et al.  A neural network based information granulation approach to shorten the cellular phone test process , 2006, Comput. Ind..

[26]  Venkat Allada,et al.  Application of mahalanobis distance as a lean assessment metric , 2006 .

[27]  P. Das,et al.  Exploring the effects of chemical composition in hot rolled steel product using Mahalanobis distance scale under Mahalanobis–Taguchi system , 2007 .