Benefitting from the Variables that Variable Selection Discards

In supervised learning variable selection is used to find a subset of the available inputs that accurately predict the output. This paper shows that some of the variables that variable selection discards can beneficially be used as extra outputs for inductive transfer. Using discarded input variables as extra outputs forces the model to learn mappings from the variables that were selected as inputs to these extra outputs. Inductive transfer makes what is learned by these mappings available to the model that is being trained on the main output, often resulting in improved performance on that main output. We present three synthetic problems (two regression problems and one classification problem) where performance improves if some variables discarded by variable selection are used as extra outputs. We then apply variable selection to two real problems (DNA splice-junction and pneumonia risk prediction) and demonstrate the same effect: using some of the discarded input variables as extra outputs yields somewhat better performance on both of these problems than can be achieved by variable selection alone. This new approach enhances the benefit of variable selection by allowing the learner to benefit from variables that would otherwise have been discarded by variable selection, but without suffering the loss in performance that occurs when these variables are used as inputs.

[1]  Dana H. Ballard,et al.  Category Learning Through Multimodality Sensing , 1998, Neural Computation.

[2]  V. D. de Sa Category learning through multimodality sensing. , 1998, Neural computation.

[3]  Yoshua Bengio,et al.  Multi-Task Learning for Stock Selection , 1996, NIPS.

[4]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[5]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[6]  Jude W. Shavlik,et al.  Training Knowledge-Based Neural Networks to Recognize Genes , 1990, NIPS.

[7]  Thomas G. Dietterich,et al.  A Comparison of ID3 and Backpropagation for English Text-To-Speech Mapping , 2004, Machine Learning.

[8]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Shumeet Baluja,et al.  Using the Representation in a Neural Network's Hidden Layer for Task-Specific Focus of Attention , 1995, IJCAI.

[13]  Rich Caruana,et al.  Algorithms and Applications for Multitask Learning , 1996, ICML.

[14]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[15]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[16]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[17]  Steven C. Suddarth,et al.  Symbolic-Neural Systems and the Use of Hints for Developing Complex Systems , 1991, Int. J. Man Mach. Stud..

[18]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[19]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[20]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[21]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[22]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[23]  Geoffrey E. Hinton,et al.  Spatial coherence as an internal teacher for a neural network , 1995 .

[24]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[25]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[26]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[27]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.