Feature Selection and Kernel Design via Linear Programming

The definition of object (e.g., data point) similarity is critical to the performance of many machine learning algorithms, both in terms of accuracy and computational efficiency. However, it is often the case that a similarity function is unknown or chosen by hand. This paper introduces a formulation that given relative similarity comparisons among triples of points of the form object i is more like object j than object k, it constructs a kernel function that preserves the given relationships. Our approach is based on learning a kernel that is a combination of functions taken from a set of base functions (these could be kernels as well). The formulation is based on defining an optimization problem that can be solved using linear programming instead of a semidefinite program usually required for kernel learning. We show how to construct a convex problem from the given set of similarity comparisons and then arrive to a linear programming formulation by employing a subset of the positive definite matrices. We extend this formulation to consider representation/evaluation efficiency based on formulating a novel form of feature selection using kernels (that is not much more expensive to solve). Using publicly available data, we experimentally demonstrate how the formulation introduced in this paper shows excellent performance in practice by comparing it with a baseline method and a related state-of-the art approach, in addition of being much more efficient computationally.

[1]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[2]  Temple F. Smith Occam's razor , 1980, Nature.

[3]  Gene H. Golub,et al.  Matrix computations , 1983 .

[4]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[5]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[6]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[7]  George Kollios,et al.  BoostMap: A method for efficient approximate similarity rankings , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[9]  Glenn Fung,et al.  Learning sparse metrics via linear programming , 2006, KDD '06.

[10]  Thore Graepel,et al.  Kernel Matrix Completion by Semidefinite Programming , 2002, ICANN.

[11]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[12]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[13]  Murat Dundar,et al.  A fast iterative algorithm for fisher discriminant using heterogeneous kernels , 2004, ICML.

[14]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .