It is often possible to use expert knowledge or other sources of information to obtain dissimilarity measures for pairs of o bjects, which serve as pseudo-distances between the objects. When d issimilarity information is available as the data, there are t wo different types of problems of interest. The first is to estimate full position configuration for all objects in a low dimensional s pace while respecting the dissimilarity information. This is us ually for the purposes of visualizing the data and/or conducting furt her statistical analysis, such as clustering or classification. Mu ltidimensional Scaling (MDS), which is still an active research area , has been traditionally used to tackle this problem. In the secon d type of problems, the high dimensional data points are assumed to lie on a low dimensional manifold and the goal is to unfold the man ifold in order to recover the underlying intrinsic low dimen sional structure. We provide a novel, unified framework called Kernel Regularization to optimally solve both types of problems. Advan ced optimization techniques are utilized to obtain the global s olutions accurately and efficiently. The proposed method can na turally accommodate the dissimilarity information with pos sibly crude, noisy, incomplete, inconsistent and weighted obser vations. Various favorable operating characteristics and properti es of the method are illustrated using both simulated and real data se ts. 1 Dissimilarity Information and Regularized Kernel Estimate Given a set ofN objects, suppose we have obtained a measure of dissimilarity,dij , for certain object pairs(i, j). We introduce the class of Regularized Kernel Estimates (RKEs), which we defin e as solutions to optimization problems of the following form:
[1]
R. Sibson.
Studies in the Robustness of Multidimensional Scaling: Procrustes Statistics
,
1978
.
[2]
E. Myers,et al.
Basic local alignment search tool.
,
1990,
Journal of molecular biology.
[3]
D. Haussler,et al.
Hidden Markov models in computational biology. Applications to protein modeling.
,
1993,
Journal of molecular biology.
[4]
Yurii Nesterov,et al.
Interior-point polynomial algorithms in convex programming
,
1994,
Siam studies in applied mathematics.
[5]
R. Tibshirani.
Regression Shrinkage and Selection via the Lasso
,
1996
.
[6]
Sean R. Eddy,et al.
Profile hidden Markov models
,
1998,
Bioinform..
[7]
Andreas Buja,et al.
Visualization Methodology for Multidimensional Scaling
,
2002,
J. Classif..
[8]
Kim-Chuan Toh,et al.
Solving semidefinite-quadratic-linear programs using SDPT3
,
2003,
Math. Program..
[9]
Kilian Q. Weinberger,et al.
Learning a kernel matrix for nonlinear dimensionality reduction
,
2004,
ICML.
[10]
Jean YH Yang,et al.
Bioconductor: open software development for computational biology and bioinformatics
,
2004,
Genome Biology.
[11]
Stephen J. Wright,et al.
Framework for kernel regularization with application to protein clustering.
,
2005,
Proceedings of the National Academy of Sciences of the United States of America.