Distributed Pareto Optimization for Large-Scale Noisy Subset Selection

Subset selection, aiming to select the best subset from a ground set with respect to some objective function, is a fundamental problem with applications in many areas, such as combinatorial optimization, machine learning, data mining, computer vision, information retrieval, etc. Along with the development of data collection and storage, the size of the ground set grows larger. Furthermore, in many subset selection applications, the objective function evaluation is subject to noise. We thus study the large-scale noisy subset selection problem in this paper. The recently proposed DPOSS algorithm based on multiobjective evolutionary optimization is a powerful distributed solver for large-scale subset selection. Its performance, however, has been only validated in the noise-free environment. In this paper, we first prove its approximation guarantee under two common noise models, i.e., multiplicative noise and additive noise, disclosing that the presence of noise degrades the performance of DPOSS largely. Next, we propose a new distributed multiobjective evolutionary algorithm called DPONSS for large-scale noisy subset selection. We prove that the approximation guarantee of DPONSS under noise is significantly better than that of DPOSS. We also conduct experiments on the application of sparse regression, where the objective evaluation is often estimated using a sample data, bringing noise. The results on various real-world data sets, whose size can reach millions, clearly show the excellent performance of DPONSS.

[1]  Abhimanyu Das,et al.  Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection , 2018, J. Mach. Learn. Res..

[2]  Yuren Zhou,et al.  Performance Analysis of Evolutionary Algorithms for the Minimum Label Spanning Tree Problem , 2014, IEEE Transactions on Evolutionary Computation.

[3]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[4]  Yaron Singer,et al.  Maximization of Approximately Submodular Functions , 2016, NIPS.

[5]  Sandor Markon,et al.  Threshold selection, hypothesis tests, and DOE methods , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[6]  Andreas Krause,et al.  Guarantees for Greedy Maximization of Non-submodular Functions with Applications , 2017, ICML.

[7]  Leonardo Vanneschi,et al.  Multiobjective Metaheuristic to Design RNA Sequences , 2019, IEEE Transactions on Evolutionary Computation.

[8]  Laurence A. Wolsey,et al.  Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..

[9]  Jürgen Branke,et al.  Evolutionary optimization in uncertain environments-a survey , 2005, IEEE Transactions on Evolutionary Computation.

[10]  Andrew M. Sutton,et al.  The Compact Genetic Algorithm is Efficient Under Extreme Gaussian Noise , 2017, IEEE Transactions on Evolutionary Computation.

[11]  Mohamed S. Kamel,et al.  An Efficient Greedy Method for Unsupervised Feature Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[12]  Mark Harman,et al.  Genetic Improvement of Software: A Comprehensive Survey , 2018, IEEE Transactions on Evolutionary Computation.

[13]  Yevgeniy Vorobeychik,et al.  Submodular Optimization with Routing Constraints , 2016, AAAI.

[14]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[15]  Xin Yao,et al.  Meta-Heuristic Algorithms in Car Engine Design: A Literature Survey , 2015, IEEE Transactions on Evolutionary Computation.

[16]  Andreas Krause,et al.  Distributed Submodular Maximization: Identifying Representative Elements in Massive Data , 2013, NIPS.

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[19]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[20]  Andreas Krause,et al.  Noisy Submodular Maximization via Adaptive Sampling with Applications to Crowdsourced Image Collection Summarization , 2015, AAAI.

[21]  Clara Pizzuti,et al.  Evolutionary Computation for Community Detection in Networks: A Review , 2018, IEEE Transactions on Evolutionary Computation.

[22]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[23]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[24]  Avinatan Hassidim,et al.  Submodular Optimization under Noise , 2016, COLT.

[25]  Yang Yu,et al.  Subset Selection under Noise , 2017, NIPS.

[26]  Alexandros G. Dimakis,et al.  Restricted Strong Convexity Implies Weak Submodularity , 2016, The Annals of Statistics.

[27]  Huy L. Nguyen,et al.  The Power of Randomization: Distributed Submodular Maximization on Massive Datasets , 2015, ICML.

[28]  Yang Yu,et al.  Subset Selection by Pareto Optimization , 2015, NIPS.

[29]  Chao Feng,et al.  Unsupervised Feature Selection by Pareto Optimization , 2019, AAAI.

[30]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[31]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[32]  Zhi-Hua Zhou,et al.  Constrained Monotone $k$ -Submodular Function Maximization Using Multiobjective Evolutionary Algorithms With Theoretical Guarantee , 2018, IEEE Transactions on Evolutionary Computation.

[33]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Marco Laumanns,et al.  Running time analysis of multiobjective evolutionary algorithms on pseudo-Boolean functions , 2004, IEEE Transactions on Evolutionary Computation.

[35]  D. A. Kenny,et al.  Statistics for the social and behavioral sciences , 1987 .

[36]  Chao Feng,et al.  Distributed Pareto Optimization for Subset Selection , 2018, IJCAI.

[37]  Yang Yu,et al.  Approximation Guarantees of Stochastic Greedy Algorithms for Subset Selection , 2018, IJCAI.

[38]  Jorge Gomes,et al.  Evolution of Repertoire-Based Control for Robots With Complex Locomotor Systems , 2018, IEEE Transactions on Evolutionary Computation.

[39]  Zhi-Hua Zhou,et al.  Analyzing Evolutionary Optimization in Noisy Environments , 2013, Evolutionary Computation.

[40]  Frank Neumann,et al.  Maximizing Submodular Functions under Matroid Constraints by Evolutionary Algorithms , 2015, Evolutionary Computation.

[41]  A. Atkinson Subset Selection in Regression , 1992 .

[42]  Mengjie Zhang,et al.  Automated Design of Production Scheduling Heuristics: A Review , 2016, IEEE Transactions on Evolutionary Computation.