Formulating context-dependent similarity functions

Tasks of information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a context-dependent (also application-, data-, and user-dependent) way. In this paper, we present a novel method, which learns a distance function by capturing the nonlinear relationships among contextual information provided by the application, data, or user. We show that through a process called the "kernel trick," such nonlinear relationships can be learned efficiently in a projected space. In addition to using the kernel trick, we propose two algorithms to further enhance efficiency and effectiveness of function learning. For efficiency, we propose a SMO-like solver to achieve O(N2) learning performance. For effectiveness, we propose using unsupervised learning in an innovative way to address the challenge of lack of labeled data (contextual information). Theoretically, we substantiate that our method is both sound and optimal. Empirically, we demonstrate that our method is effective and useful.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[3]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[4]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[5]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[6]  Shi-Min Hu,et al.  Adaptive tree similarity learning for image retrieval , 2003, Multimedia Systems.

[7]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[9]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[10]  Charu C. Aggarwal,et al.  Towards systematic design of distance functions for data mining applications , 2003, KDD '03.

[11]  Morton Nadler,et al.  Pattern recognition engineering , 1993 .

[12]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[13]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[14]  Andrew R. Webb,et al.  Multidimensional scaling by iterative majorization using radial basis functions , 1995, Pattern Recognit..

[15]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[16]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[17]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[18]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[19]  Thomas S. Huang,et al.  Optimizing learning in image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[21]  Zhihua Zhang,et al.  Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[22]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[26]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[27]  Edward Y. Chang,et al.  Identifying Color in Motion in Video Sensors , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[29]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[30]  Edward Y. Chang,et al.  Formulating distance functions via the kernel trick , 2005, KDD '05.

[31]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[32]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.