An Efficient Search Strategy for Feature Selection Using Chow-Liu Trees

Within the taxonomy of feature extraction methods, recently the Wrapper approaches lost some popularity due to the associated computational burden, compared to Embedded or Filter methods. The dominating factor in terms of computational costs is the number of adaption cycles used to train the black box classifier or function approximator, e.g. a Multi Layer Perceptron. To keep a wrapper approach feasible, the number of adaption cycles has to be minimized, without increasing the risk of missing important feature subset combinations. We propose a search strategy, that exploits the interesting properties of Chow-Liu trees to reduce the number of considered subsets significantly. Our approach restricts the candidate set of possible new features in a forward selection step to children from certain tree nodes. We compare our algorithm with some basic and well known approaches for feature subset selection. The results obtained demonstrate the efficiency and effectiveness of our method.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[3]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[6]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[7]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[8]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[11]  Juha Reunanen Search Strategies , 2006, Feature Extraction.

[12]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[13]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.