CoRE Kernels

The term "CoRE kernel" stands for correlation-resemblance kernel. In many real-world applications (e.g., computer vision), the data are often high-dimensional, sparse, and non-binary. We propose two types of (nonlinear) CoRE kernels for non-binary sparse data and demonstrate the effectiveness of the new kernels through a classification experiment. CoRE kernels are simple with no tuning parameters. However, training nonlinear kernel SVM can be costly in time and memory and may not be always suitable for truly large-scale industrial applications (e.g., search). In order to make the proposed CoRE kernels more practical, we develop basic probabilistic hashing (approximate) algorithms which transform nonlinear kernels into linear kernels.

[1]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[2]  Thomas Lavergne,et al.  Tracking Web spam with HTML style similarities , 2008, TWEB.

[3]  Ping Li,et al.  Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost , 2010, UAI.

[4]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[5]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..

[6]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8]  Kenneth Ward Church,et al.  Using Sketches to Estimate Associations , 2005, HLT.

[9]  Ping Li,et al.  b-Bit minwise hashing , 2009, WWW '10.

[10]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[11]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Sreenivas Gollapudi,et al.  Less is more: sampling the neighborhood graph makes SALSA better and faster , 2009, WSDM '09.

[14]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[15]  Ping Li,et al.  ABC-boost: adaptive base class boost for multi-class classification , 2008, ICML '09.

[16]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[17]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[18]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[19]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[20]  Ping Li,et al.  Hashing Algorithms for Large-Scale Learning , 2011, NIPS.

[21]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[22]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[23]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[24]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[25]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[26]  Thomas Hofmann,et al.  Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data , 2007 .

[27]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.