Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning

Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of large-scale machine learning. Using k-means clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model. Our summarization is based on coreset constructions from computational geometry. We also develop an algorithm, TRAM, to navigate the space/time/data/risk tradeoff in practice. In particular, we show that for a fixed risk (or data size), as the data size increases (resp. risk increases) the running time of TRAM decreases. Our extensive experiments on real data sets demonstrate the existence and practical utility of such tradeoffs, not only for k-means but also for Gaussian Mixture Models.

[1]  Adam Meyerson,et al.  A k-Median Algorithm with Running Time Independent of Data Size , 2004, Machine Learning.

[2]  Jason Weston,et al.  Breaking SVM Complexity with Cross-Training , 2004, NIPS.

[3]  Shai Shalev-Shwartz,et al.  Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs , 2012, NIPS.

[4]  T. Linder LEARNING-THEORETIC METHODS IN VECTOR QUANTIZATION , 2002 .

[5]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[6]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[7]  Shai Ben-David,et al.  A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering , 2007, Machine Learning.

[8]  Andreas Krause,et al.  Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[9]  Padhraic Smyth,et al.  Towards scalable support vector machines using squashing , 2000, KDD '00.

[10]  Dan Feldman,et al.  A PTAS for k-means clustering based on weak coresets , 2007, SCG '07.

[11]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Michael I. Jordan,et al.  Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.

[13]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[14]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[15]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC '11.

[16]  László Györfi,et al.  Individual convergence rates in empirical vector quantizer design , 2005, IEEE Transactions on Information Theory.

[17]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[18]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[19]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[20]  Andreas Krause,et al.  The next big one: Detecting earthquakes and other rare events from community-based sensors , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[21]  Sanjoy Dasgupta,et al.  Moment-based Uniform Deviation Bounds for k-means and Friends , 2013, NIPS.

[22]  Dana Ron,et al.  Computational sample complexity , 1997, COLT '97.

[23]  S. Graf,et al.  Foundations of Quantization for Probability Distributions , 2000 .

[24]  Rocco A. Servedio Computational sample complexity and attribute-efficient learning , 1999, STOC '99.

[25]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[26]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[27]  Yingyu Liang,et al.  Distributed k-Means and k-Median Clustering on General Topologies , 2013, NIPS 2013.

[28]  János Pach,et al.  Combinatorial and Computational Geometry , 2011 .

[29]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.