论文信息 - Regularising Non-linear Models Using Feature Side-information

Regularising Non-linear Models Using Feature Side-information

Very often features come with their own vectorial descriptions which provide detailed information about their properties. We refer to these vectorial descriptions as feature side-information. In the standard learning scenario, input is represented as a vector of features and the feature side-information is most often ignored or used only for feature selection prior to model fitting. We believe that feature side-information which carries information about features intrinsic property will help improve model prediction if used in a proper way during learning process. In this paper, we propose a framework that allows for the incorporation of the feature side-information during the learning of very general model families to improve the prediction performance. We control the structures of the learned models so that they reflect features similarities as these are defined on the basis of the side-information. We perform experiments on a number of benchmark datasets which show significant predictive performance gains, over a number of baselines, as a result of the exploitation of the side-information.

Alexandros Kalousis | Pablo Strasser | Amina Mollaysa

[1] Inderjit S. Dhillon,et al. Matrix Completion with Noisy Side Information , 2015, NIPS.

[2] kPT xiy,et al. Robust Principal Component Analysis with Side Information , 2016 .

[3] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[4] Pradeep Ravikumar,et al. Collaborative Filtering with Graph Information: Consistency and Scalable Methods , 2015, NIPS.

[5] Adrian Corduneanu,et al. On Information Regularization , 2002, UAI.

[6] Yann LeCun,et al. Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[7] Bernhard Schölkopf,et al. Training Invariant Support Vector Machines , 2002, Machine Learning.

[8] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[9] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..