Causal & non-causal feature selection for ridge regression

In this paper we investigate the use of causal and non-causal feature selection methods for linear classiers in situations where the causal relationships between the input and response variables may dier between the training and operational data. The causal feature selection methods investigated include inference of the Markov Blanket and inference of direct causes and of direct eects. The non-causal feature selection method is based on logistic regression with Bayesian regularisation using a Laplace prior. A simple ridge regression model is used as the base classier, where the ridge parameter is eciently tuned so as to minimise the leave-one-out error, via eigen-decomposition of the data covariance matrix. For tasks with more features than patterns, linear kernel ridge regression is used for computational eciency. Results are presented for all of the WCCI-2008 Causation and Prediction Challenge datasets, demonstrating that, somewhat surprisingly, causal feature selection procedures do not provide signicant benets in terms of predictive accuracy over non-causal feature selection and/or classication using the entire feature set.

[1]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[2]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Alan J. Miller Subset Selection in Regression , 1992 .

[5]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[6]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  Gavin C. Cawley,et al.  Heteroscedastic kernel ridge regression , 2004, Neurocomputing.

[9]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[10]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[11]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[12]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[13]  Gavin C. Cawley,et al.  Optimally regularised kernel Fisher discriminant classification , 2007, Neural Networks.

[14]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[15]  Chih-Jen Lin,et al.  Feature Ranking Using Linear SVM , 2008, WCCI Causation and Prediction Challenge.

[16]  Ping He,et al.  Partial orientation and local structural learning of causal networks for prediction , 2008, WCCI Causation and Prediction Challenge.

[17]  Isabelle Guyon,et al.  Design and Analysis of the Causation and Prediction Challenge , 2008, WCCI Causation and Prediction Challenge.

[18]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[19]  Andrew P. Robinson,et al.  Reducing variability of crossvalidation for smoothing-parameter choice , 2009 .