论文信息 - Feature selection focused within error clusters

Feature selection focused within error clusters

We propose a feature selection method that constructs each new feature by analysis of tight error clusters. This is a greedy, time-efficient forward selection algorithm that iteratively constructs one feature at a time, until a desired error rate is reached. The algorithm finds error clusters in the current feature space, then projects one tight cluster into the null space of the feature mapping, where a new feature that helps to classify these errors can be discovered. Tight error clusters indicate that the current features are unable to discriminate these samples. The approach is strongly data-driven and restricted to linear features, but otherwise general. Large scale experiments show that it can achieve a monotonically decreasing error rate within the feature discovery set, and a generally decreasing error rate on a distinct test set.

Henry S. Baird | Sui-Yu Wang | H. Baird | Sui-Yu Wang

[1] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[2] Masoud Nikravesh,et al. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[3] Herman Wold,et al. The Fix-point approach to interdependent systems , 1981 .

[4] Henry S. Baird,et al. Versatile document image content extraction , 2006, Electronic Imaging.

[5] Henry S. Baird,et al. Towards Versatile Document Analysis Systems , 2006, Document Analysis Systems.

[6] Toshio Odanaka,et al. ADAPTIVE CONTROL PROCESSES , 1990 .

[7] Sanjoy Dasgupta,et al. Adaptive Control Processes , 2010, Encyclopedia of Machine Learning and Data Mining.

[8] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[9] Satosi Watanabe,et al. Pattern Recognition: Human and Mechanical , 1985 .

[10] D. Wolpert,et al. No Free Lunch Theorems for Search , 1995 .

[11] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.

[12] J. Davenport. Editor , 1960 .

[13] P. C. Cross,et al. Elementary matrix algebra , 1959 .

[14] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[15] E. Ecer,et al. Numerical Linear Algebra and Applications , 1995, IEEE Computational Science and Engineering.

[16] L. Hogben. Handbook of Linear Algebra , 2006 .

[17] R. Bellman,et al. V. Adaptive Control Processes , 1964 .

[18] J. Elashoff,et al. On the choice of variables in classification problems with dichotomous variables. , 1967, Biometrika.

[19] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[20] Kristin P. Bennett,et al. Constructing Orthogonal Latent Features for Arbitrary Loss , 2006, Feature Extraction.

[21] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[22] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23] Masoud Nikravesh,et al. Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[24] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.