Kernel based learning algorithms allow the mapping of dataset into an infinite dimensional feature space in which a classification may be perform ed. As such kernel methods represent a powerful approach to the solution of many non-li near problems. However kernel methods do suffer from one unfortunate drawback, the Gra m matrix containsm rows and columns wherem is the number of data-points. Many operations are precluded (e.g. matrix inverseO(m3)) when data-sets containing more than about 104 points are encountered. One approach to resolving these issues is to look for s parse representations of the data-set [7, 5, 2]. A sparse representation contains a reduc ed number of examples. Loosely speaking we are interested in extracting the maximu m amount of information from the minimum number of data-points. To achieve this in a p rincipled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodo logy t determine how much information is gained from each data-point.
[1]
David J. C. MacKay,et al.
Information-Based Objective Functions for Active Data Selection
,
1992,
Neural Computation.
[2]
Michael I. Jordan,et al.
Advances in Neural Information Processing Systems 30
,
1995
.
[3]
Matthias W. Seeger,et al.
Using the Nyström Method to Speed Up Kernel Machines
,
2000,
NIPS.
[4]
Manfred Opper,et al.
Sparse Representation for Gaussian Process Models
,
2000,
NIPS.
[5]
Thomas M. Cover,et al.
Elements of Information Theory
,
2005
.
[6]
Michael E. Tipping.
The Relevance Vector Machine
,
1999,
NIPS.
[7]
Bernhard Schölkopf,et al.
Sparse Greedy Matrix Approximation for Machine Learning
,
2000,
International Conference on Machine Learning.