Feature hashing for large scale multitask learning

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

[1]  A E Bostwick,et al.  THE THEORY OF PROBABILITIES. , 1896, Science.

[2]  P. B. Kleidman,et al.  The Subgroup Structure of the Finite Classical Groups: The Statement of the Main Theorem , 1990 .

[3]  Thomas L. Griffiths,et al.  Advances in Neural Information Processing Systems 21 , 1993, NIPS 2009.

[4]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5]  M. Ledoux The concentration of measure phenomenon , 2001 .

[6]  N. Alon Problems and results in Extremal Combinatorics , Part , 2002 .

[7]  Noga Alon,et al.  Problems and results in extremal combinatorics--I , 2003, Discret. Math..

[8]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[9]  Graham Cormode,et al.  An Improved Data Stream Summary: The Count-Min Sketch and Its Applications , 2004, LATIN.

[10]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[11]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[12]  Kenneth Ward Church,et al.  Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data , 2006, NIPS.

[13]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[14]  James Bennett,et al.  The Netflix Prize , 2007 .

[15]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[16]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[17]  Jirí Matousek,et al.  On variants of the Johnson–Lindenstrauss lemma , 2008, Random Struct. Algorithms.

[18]  Mark Dredze,et al.  Small Statistical Models by Random Feature Mixing , 2008, ACL 2008.

[19]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[20]  J. Matousek,et al.  On variants of the Johnson–Lindenstrauss lemma , 2008 .

[21]  Amit Singer,et al.  Dense Fast Random Projections and Lean Walsh Transforms , 2008, APPROX-RANDOM.

[22]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.