An attribute extending method to improve learning performance for small datasets

Abstract A small dataset often makes it difficult to build a reliable learning model, and thus some researchers have proposed virtual sample generation (VSG) methods to add artificial samples into small datasets to extend the data size. However, for some datasets the assumption of the distribution of data in the VSG methods may be vague, and when data only has a few attributes, such approaches may not work effectively. Other researchers thus proposed attribute extension methods to generate attributes to convert data into a higher dimensional space. Unfortunately, the resulting dataset may become a sparse dataset with many null or zero values in extended attributes, and then a large quantity of such attributes will reduce the representativeness of instances for the learning model. Therefore, based on fuzzy theories, this paper proposes a novel sample attribute extending (SEA) method to extend a suitable quantity of attributes to improve small dataset learning. In order to verify the validity of the SEA method, using SVR and BPNN, this paper adopts two real cases and two public datasets to conduct the learning of the predictive model, and uses the paired t-test to statistically examine the significance of improvement. The experimental results show that the proposed SEA method can effectively improve the learning accuracy of small datasets.

[1]  Wei Chen,et al.  A novel sparse representation method based on virtual samples for face recognition , 2012, Neural Computing and Applications.

[2]  Zhaohui Wu,et al.  Sparse Principal Component Analysis via Rotation and Truncation , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Der-Chiang Li,et al.  Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge , 2007, Comput. Oper. Res..

[4]  Der-Chiang Li,et al.  Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs , 2012 .

[5]  Tzu-Chieh Hung,et al.  Uncertainty quantifications of Pareto optima in multiobjective problems , 2013, J. Intell. Manuf..

[6]  Der-Chiang Li,et al.  A new approach to assess product lifetime performance for small data sets , 2013, Eur. J. Oper. Res..

[7]  Claudio Moraga,et al.  A diffusion-neural-network for learning from small samples , 2004, Int. J. Approx. Reason..

[8]  Zizhu Fan,et al.  Weighted sparse representation for face recognition , 2015, Neurocomputing.

[9]  Mohsen Ramezani,et al.  A pattern mining approach to enhance the accuracy of collaborative filtering in sparse data domains , 2014 .

[10]  Jianpei Zhang,et al.  A novel virtual sample generation method based on Gaussian distribution , 2011, Knowl. Based Syst..

[11]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[12]  Arturo Berrones,et al.  Parameter inference of general nonlinear dynamical models of gene regulatory networks from small and noisy time series , 2016, Neurocomputing.

[13]  Der-Chiang Li,et al.  Extending Attribute Information for Small Data Set Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[14]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[15]  Ebru Akcapinar Sezer,et al.  An assessment on producing synthetic samples by fuzzy C-means for limited number of data in prediction models , 2014, Appl. Soft Comput..

[16]  Sheng Chen,et al.  PDFOS: PDF estimation based over-sampling for imbalanced two-class problems , 2014, Neurocomputing.

[17]  Jian Yang,et al.  Integrate the original face image and its mirror image for face recognition , 2014, Neurocomputing.

[18]  Der-Chiang Li,et al.  A genetic algorithm-based virtual sample generation technique to improve small data set learning , 2014, Neurocomputing.

[19]  Mikel Galar,et al.  Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy , 2016, Appl. Soft Comput..

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Xiao Zhang,et al.  Learning kernel subspace for face recognition , 2015, Neurocomputing.