Extending Attribute Information for Small Data Set Classification

Data quantity is the main issue in the small data set problem, because usually insufficient data will not lead to a robust classification performance. How to extract more effective information from a small data set is thus of considerable interest. This paper proposes a new attribute construction approach which converts the original data attributes into a higher dimensional feature space to extract more attribute information by a similarity-based algorithm using the classification-oriented fuzzy membership function. Seven data sets with different attribute sizes are employed to examine the performance of the proposed method. The results show that the proposed method has a superior classification performance when compared to principal component analysis (PCA), kernel principal component analysis (KPCA), and kernel independent component analysis (KICA) with a Gaussian kernel in the support vector machine (SVM) classifier.

[1]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Giulia Pagallo,et al.  Learning DNF by Decision Trees , 1989, IJCAI.

[5]  Asoke K. Nandi,et al.  Breast Cancer Diagnosis Using Genetic Programming Generated Feature , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[6]  Chong-Ho Choi,et al.  Feature Extraction Based on ICA for Binary Classification Problems , 2003, IEEE Trans. Knowl. Data Eng..

[7]  R. Jennrich,et al.  Unbalanced repeated-measures models with structured covariance matrices. , 1986, Biometrics.

[8]  Yoshihiko Hamamoto,et al.  Improvement of the Parzen classifier in small training sample size situations , 2001, Intell. Data Anal..

[9]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[10]  Mark Johnston,et al.  Feature Construction and Dimension Reduction Using Genetic Programming , 2007, Australian Conference on Artificial Intelligence.

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  Sankar K. Pal,et al.  Unsupervised feature evaluation: a neuro-fuzzy approach , 2000, IEEE Trans. Neural Networks Learn. Syst..

[13]  E. Morales,et al.  Automatic Feature Construction and a Simple Rule Induction Algorithm for Skin Detection , 2002 .

[14]  Kate Smith-Miles,et al.  A meta-learning approach to automatic kernel selection for support vector machines , 2006, Neurocomputing.

[15]  Fernando E. B. Otero,et al.  Genetic Programming for Attribute Construction in Data Mining , 2002, EuroGP.

[16]  Riyaz Sikora,et al.  Iterative feature construction for improving inductive learning algorithms , 2009, Expert Syst. Appl..

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Rhee Man Kil,et al.  Pattern Classification With Class Probability Output Network , 2009, IEEE Transactions on Neural Networks.

[19]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[20]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[21]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[22]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[23]  George D. Smith,et al.  Evolutionary Feature Construction Using Information Gain and Gini Index , 2004, EuroGP.

[24]  Der-Chiang Li,et al.  A new method to help diagnose cancers for small sample size , 2007, Expert Syst. Appl..

[25]  Charles A. Micchelli,et al.  Maximum entropy and maximum likelihood criteria for feature selection from multivariate data , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[26]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[27]  Shigeo Abe,et al.  A novel approach to feature selection based on analysis of class regions , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[28]  Larry Bull,et al.  Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.

[29]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[30]  Der-Chiang Li,et al.  Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge , 2007, Comput. Oper. Res..

[31]  Chong-Ho Choi,et al.  A discriminant analysis using composite features for classification problems , 2007, Pattern Recognit..

[32]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[33]  H. Altay Güvenir,et al.  Voting features based classifier with feature construction and its application to predicting financial distress , 2010, Expert Syst. Appl..

[34]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[35]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[36]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[37]  Claudio Moraga,et al.  A diffusion-neural-network for learning from small samples , 2004, Int. J. Approx. Reason..

[38]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[39]  Hiroshi Motoda,et al.  Feature Selection Extraction and Construction , 2002 .

[40]  Selwyn Piramuthu Feed-forward neural networks and feature construction with correlation information: an integrated framework , 1996 .

[41]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[42]  Chun-Wu Yeh,et al.  A non-parametric learning algorithm for small manufacturing data sets , 2008, Expert Syst. Appl..

[43]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[44]  Vladimir Vapnik,et al.  Universal learning technology : Support vector machines , 2005 .

[45]  Colin Campbell,et al.  Kernel methods: a survey of current techniques , 2002, Neurocomputing.

[46]  Der-Chiang Li,et al.  A neural network weight determination model designed uniquely for small data set learning , 2009, Expert Syst. Appl..

[47]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[48]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.