Restricted Strong Convexity Implies Weak Submodularity

We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe (2011) from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form under standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connection between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure.

[1]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[2]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[3]  Andreas Krause,et al.  Lazier Than Lazy Greedy , 2014, AAAI.

[4]  Xiao-Tong Yuan,et al.  Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization , 2013, ICML.

[5]  Gérard Cornuéjols,et al.  Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem , 1984, Discret. Appl. Math..

[6]  Rishabh K. Iyer,et al.  Polyhedral aspects of Submodularity, Convexity and Concavity , 2015, ArXiv.

[7]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[8]  Maryam Fazel,et al.  Designing smoothing functions for improved worst-case competitive ratio in online optimization , 2016, NIPS.

[9]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[10]  Tong Zhang,et al.  Sparse Recovery With Orthogonal Matching Pursuit Under RIP , 2010, IEEE Transactions on Information Theory.

[11]  Jeff A. Bilmes,et al.  Using Document Summarization Techniques for Speech Data Subset Selection , 2013, NAACL.

[12]  B. Mallick,et al.  Generalized Linear Models : A Bayesian Perspective , 2000 .

[13]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[14]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[15]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[16]  Jieping Ye,et al.  Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint , 2013, ICML.

[17]  T. Blumensath,et al.  On the Difference Between Orthogonal Matching Pursuit and Orthogonal Least Squares , 2007 .

[18]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[19]  Ambuj Tewari,et al.  Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity , 2009, AISTATS.

[20]  Joseph K. Bradley,et al.  Parallel Double Greedy Submodular Maximization , 2014, NIPS.

[21]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[22]  Alexandros G. Dimakis,et al.  Streaming Weak Submodularity: Interpreting Neural Networks on the Fly , 2017, NIPS.

[23]  Elad Hazan,et al.  Online submodular minimization , 2009, J. Mach. Learn. Res..

[24]  Yaron Singer,et al.  Maximization of Approximately Submodular Functions , 2016, NIPS.

[25]  Amin Karbasi,et al.  Weakly Submodular Maximization Beyond Cardinality Constraints: Does Randomization Help Greedy? , 2017, ICML.

[26]  Naoki Abe,et al.  Group Orthogonal Matching Pursuit for Logistic Regression , 2011, AISTATS.

[27]  Bhiksha Raj,et al.  Greedy sparsity-constrained optimization , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[28]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[29]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[30]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[31]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[32]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[33]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[34]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[35]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Greedy Methods , 2011, NIPS.

[36]  Aditya Bhaskara,et al.  Greedy Column Subset Selection: New Bounds and Distributed Algorithms , 2016, ICML.

[37]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[38]  Yonina C. Eldar,et al.  Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference , 2015, ArXiv.

[39]  Alexandros G. Dimakis,et al.  On Approximation Guarantees for Greedy Low Rank Optimization , 2017, ICML.

[40]  Andreas Krause,et al.  Guaranteed Non-convex Optimization: Submodular Maximization over Continuous Domains , 2016, AISTATS.

[41]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[42]  Huy L. Nguyen,et al.  The Power of Randomization: Distributed Submodular Maximization on Massive Datasets , 2015, ICML.

[43]  Amin Karbasi,et al.  Gradient Methods for Submodular Maximization , 2017, NIPS.

[44]  Pradeep Ravikumar,et al.  Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[45]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[46]  Yang Yu,et al.  Subset Selection by Pareto Optimization , 2015, NIPS.

[47]  Andreas Krause,et al.  Submodular Dictionary Selection for Sparse Representation , 2010, ICML.

[48]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .