Resource Allocation for Statistical Estimation

Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of data sets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource, e.g., a sensing device or computing power, for gathering or processing data from a particular source depends on the nature of that source. As a result, the appropriate division and assignment of a collection of resources to a set of data sources can substantially impact the overall performance of an inferential strategy. In this expository article, we adopt a general view of the notion of a resource and its effect on the quality of a data source, and we describe a framework for the allocation of a given set of resources to a collection of sources in order to optimize a specified metric of statistical efficiency. We discuss several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources, in which optimal allocations can be computed either in closed form or via efficient numerical procedures based on convex optimization. This work is an inferential analog of the literature in information theory on allocating power across communications channels of variable quality in order to optimize for total throughput.

[1]  Dana Ron,et al.  Computational Sample Complexity , 1999, SIAM J. Comput..

[2]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[3]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[4]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[5]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Planted Clique , 2012, Electron. Colloquium Comput. Complex..

[8]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[9]  Jelena Bradic,et al.  Support recovery via weighted maximum-contrast subagging , 2013 .

[10]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[11]  Aaron Roth,et al.  Accuracy for Sale: Aggregating Data with a Variance Constraint , 2015, ITCS.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Matthew Malloy,et al.  On Finding the Largest Mean Among Many , 2013, ArXiv.

[14]  Hristo S. Sendov,et al.  Nonsmooth Analysis of Singular Values. Part I: Theory , 2005 .

[15]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[16]  Yihong Wu,et al.  Computational Barriers in Minimax Submatrix Detection , 2013, ArXiv.

[17]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[18]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[19]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[20]  Peter Bühlmann,et al.  Magging: Maximin Aggregation for Inhomogeneous Large-Scale Data , 2014, Proceedings of the IEEE.

[21]  Santosh S. Vempala,et al.  University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[22]  Journal of the Association for Computing Machinery , 1961, Nature.

[23]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[24]  Robert D. Nowak,et al.  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[25]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[26]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[27]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[28]  Ohad Shamir,et al.  Using More Data to Speed-up Training Time , 2011, AISTATS.

[29]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[30]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[31]  John D. Lafferty,et al.  Computation-Risk Tradeoffs for Covariance-Thresholded Regression , 2013, ICML.

[32]  J. Holsinger Digital communication over fixed time-continuous channels with memory-with special application to te , 1964 .

[33]  Rocco A. Servedio Computational sample complexity and attribute-efficient learning , 1999, STOC '99.

[34]  Eric Vigoda,et al.  A Deterministic Polynomial-Time Approximation Scheme for Counting Knapsack Solutions , 2010, SIAM J. Comput..

[35]  Philippe Rigollet,et al.  Kullback-Leibler aggregation and misspecified generalized linear models , 2009, 0911.2919.

[36]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[37]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.