Partial Wasserstein Covering

We consider a general task called partial Wasserstein covering with the goal of emulating a large dataset (e.g., application dataset) using a small dataset (e.g., development dataset) in terms of the empirical distribution by selecting a small subset from a candidate dataset and adding it to the small dataset. We model this task as a discrete optimization problem with partial Wasserstein divergence as an objective function. Although this problem is NP-hard, we prove that it has the submodular property, allowing us to use a greedy algorithm with a 0.63 approximation. However, the greedy algorithm is still inefficient because it requires linear programming for each objective function evaluation. To overcome this difficulty, we propose quasi-greedy algorithms for acceleration, which consist of a series of techniques such as sensitivity analysis based on strong duality and the socalled C-transform in the optimal transport field. Experimentally, we demonstrate that we can efficiently make two datasets similar in terms of partial Wasserstein divergence, including driving scene datasets.

[1]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[2]  Miguel Toro,et al.  Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[3]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[4]  E. Silerova,et al.  Knowledge and information systems , 2018 .

[5]  Mokhtar Z. Alaya,et al.  Partial Optimal Tranport with applications on Positive-Unlabeled Learning , 2020, NeurIPS.

[6]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[7]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[8]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[9]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[10]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[11]  Changjian Shui,et al.  Deep Active Learning: Unified and Principled Method for Query and Training , 2020, AISTATS.

[12]  Morteza Zadimoghaddam,et al.  Data Summarization at Scale: A Two-Stage Submodular Approach , 2018, ICML.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  Morteza Zadimoghaddam,et al.  Fast Distributed Submodular Cover: Public-Private Data Summarization , 2016, NIPS.

[15]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[16]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[17]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[18]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[19]  Yongsub Lim,et al.  RaPP: Novelty Detection with Reconstruction along Projection Pathway , 2020, ICLR.

[20]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[21]  Sébastien Marcel,et al.  Torchvision the machine-vision package of torch , 2010, ACM Multimedia.

[22]  Hongyuan Zha,et al.  On Scalable and Efficient Computation of Large Scale Optimal Transport , 2019, ICML.

[23]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[24]  Charles ReVelle,et al.  Central Facilities Location , 2010 .

[25]  Nicolo Fusi,et al.  Geometric Dataset Distances via Optimal Transport , 2020, NeurIPS.

[26]  David Coeurjolly,et al.  SPOT , 2019, ACM Trans. Graph..

[27]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  A. Figalli The Optimal Partial Transport Problem , 2010 .

[31]  Filiberto Pla,et al.  A Stochastic Approach to Wilson's Editing Algorithm , 2005, IbPRIA.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Joydeep Ghosh,et al.  Combining clustering and active learning for the detection and learning of new image classes , 2019, Neurocomputing.

[34]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[35]  Huan Liu,et al.  On Issues of Instance Selection , 2002, Data Mining and Knowledge Discovery.

[36]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[37]  Chien-Hsing Chou,et al.  The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[38]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[39]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[40]  James C. Bezdek,et al.  Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..

[41]  Nicolò Cesa-Bianchi,et al.  Advances in Neural Information Processing Systems 31 , 2018, NIPS 2018.

[42]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[43]  Mohiuddin Ahmed Data summarization: a survey , 2018, Knowledge and Information Systems.

[44]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[45]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[46]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[47]  Amotz Bar-Noy,et al.  Tight Approximation Bounds for the Seminar Assignment Problem , 2016, WAOA.

[48]  Horst Bunke,et al.  Transforming Strings to Vector Spaces Using Prototype Selection , 2006, SSPR/SPR.

[49]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.