A novel virtual sample generation method based on Gaussian distribution

Traditional machine learning algorithms are not with satisfying generalization ability on noisy, imbalanced, and small sample training set. In this work, a novel virtual sample generation (VSG) method based on Gaussian distribution is proposed. Firstly, the method determines the mean and the standard error of Gaussian distribution. Then, virtual samples can be generated by such Gaussian distribution. Finally, a new training set is constructed by adding the virtual samples to the original training set. This work has shown that training on the new training set is equivalent to a form of regularization regarding small sample problems, or cost-sensitive learning regarding imbalanced sample problems. Experiments show that given a suitable number of virtual sample replicates, the generalization ability of the classifiers on the new training sets can be better than that on the original training sets.

[1]  Shiuan Wan,et al.  A knowledge-based decision support system to analyze the debris-flow problems at Chen-Yu-Lan River, Taiwan , 2009, Knowl. Based Syst..

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Xu Xin,et al.  Advances in Machine Learning Based Text Categorization , 2006 .

[4]  Wang Wei,et al.  Quadratic Discriminant Analysis Method Based on Virtual Training Samples , 2008 .

[5]  Sauchi Stephen Lee Noisy replication in skewed binary classification , 2000 .

[6]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[7]  Li Kun,et al.  Fuzzy Multi-Class Support Vector Machine and Application in Intrusion Detection , 2005 .

[8]  Tiesong Hu,et al.  A neural network approach for solving linear bilevel programming problem , 2010, Knowl. Based Syst..

[9]  Jun Cai,et al.  Multi-fault classification based on support vector machine trained by chaos particle swarm optimization , 2010, Knowl. Based Syst..

[10]  Der-Chiang Li,et al.  A non-linearly virtual sample generation technique using group discovery and parametric equations of hypersphere , 2009, Expert Syst. Appl..

[11]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[12]  Wei-Dong Wang Quadratic Discriminant Analysis Method Based on Virtual Training Samples: Quadratic Discriminant Analysis Method Based on Virtual Training Samples , 2009 .

[13]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[14]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[15]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[16]  Wang Xiao,et al.  Research on Ontology-driven Text Virtual Sample Constructing , 2008 .

[17]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[18]  Chen Gong-he,et al.  Method for Constructing Training Data Set in Intrusion Detection System , 2006 .

[19]  Charles Ling,et al.  A Comparative Study of Cost-Sensitive Classifiers , 2007 .

[20]  Wen Jin,et al.  A SMALL SAMPLE FACE RECOGNITION STATISTICAL LEARNING METHOD BASED ON VIRTUAL SAMPLES , 2002 .

[21]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[22]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[23]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[24]  Bo Yu,et al.  Combining neural networks and semantic feature space for email classification , 2009, Knowl. Based Syst..

[25]  Zhi-Hua Zhou,et al.  Hybrid decision tree , 2002, Knowl. Based Syst..