论文信息 - Active Search with Complex Actions and Rewards

Active Search with Complex Actions and Rewards

Active search studies algorithms that can find all positive examples in an unknown environment by collecting and learning from labels that are costly to obtain. They start with a pool of unlabeled data, act to design queries, and get rewarded by the number of positive examples found in a long-term horizon. Active search is connected to active learning, multi-armed bandits, and Bayesian optimization. To date, most active search methods are limited by assuming that the query actions and rewards are based on single data points in a low-dimensional Euclidean space. Many applications, however, define actions and rewards in a more complex way. For example, active search may be used to recommend items that are connected by a network graph, where the edges indicate item (node) similarity. The active search reward in environmental monitoring is defined by regions because pollution is only identified by finding an entire region with consistently large measurement outcomes. On the other hand, to efficiently search for sparse signal hotspots in a large area, aerial robots may act to query at high altitudes, taking the average value in an entire region. Finally, active search usually ignores the computational complexity in the design of actions, which is infeasible in large problems. We develop methods to address the disparate issues in the new problems. In a graph environment, the exploratory queries that reveal the most information about the user models are different than the Euclidean space. We used a new exploration criterion called Σ-optimality, which is motivated by a different objective, active surveying, yet empirically performed better due to a tendency to query cluster centers. We also showed submodularity-based guarantees that justify for greedy application of various heuristics including Σ-optimality and we performed regret analysis for active search with results comparable to existing literature. For active area search for region rewards, we designed an algorithm called APPS, which optimizes for onestep look-ahead expected rewards for finding positive regions with high probability. APPS was initially solved by Monte-Carlo estimates; but for simple objectives, e.g. to find region with large average pollution concentrations, APPS has a closed-form solution called AAS that connects to Bayesian quadrature. For active needle search with region queries using aerial robots, we pick queries to maximize the information gain about possible signal hotspot locations. Our method is called RSI and it reduces to bisection search if the measurements are noiseless and the signal hotspot is unique. Turning to noisy measurements, we showed that RSI has near-optimal expected number of measurements, which is comparable to results from compressive sensing (CS). On the other hand, CS relies on weighted averages, which are harder to realize than our use of plain averages. Finally, to address the scalability challenge, we borrow ideas from Thompson sampling, which approximates near-optimal decisions by drawing from the model uncertainty and using greedy decisions accordingly. Our method is conjugate sampling, which allows true computational benefits when the uncertainty is modeled with sparse or circulant matrices.

Yifei Ma | Yifei Ma

[1] Christoforos Anagnostopoulos,et al. The Automatic Neuroscientist: A framework for optimizing experimental design with closed-loop real-time fMRI , 2016, NeuroImage.

[2] Rémi Munos,et al. Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[3] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[4] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[5] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[6] Jean-François Giovannelli,et al. Sampling High-Dimensional Gaussian Distributions for General Linear Inverse Problems , 2012, IEEE Signal Processing Letters.

[7] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[8] J. Lafferty,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[9] Allison M. Okamura,et al. Methods to Segment Hard Inclusions in Soft Tissue During Autonomous Robotic Palpation , 2015, IEEE Transactions on Robotics.

[10] Robert D. Nowak,et al. Compressive distilled sensing: Sparse recovery using adaptivity in compressive measurements , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[11] Jiashun Jin,et al. Coauthorship and Citation Networks for Statisticians , 2014, ArXiv.

[12] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[13] Prasanna Velagapudi,et al. An Intelligent Approach to Hysteresis Compensation while Sampling Using a Fleet of Autonomous Watercraft , 2012, ICIRA.

[14] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[15] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .

[16] Howie Choset,et al. Using Bayesian optimization to guide probing of a flexible environment for simultaneous registration and stiffness mapping , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17] Peter I. Frazier,et al. Twenty Questions with Noise: Bayes Optimal Policies for Entropy Loss , 2012, Journal of Applied Probability.

[18] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[19] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[20] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[21] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[22] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[23] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[24] E.J. Candes,et al. An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[25] Daniel Hern'andez-Lobato,et al. Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints , 2016, Neurocomputing.

[26] Vianney Perchet,et al. Gaussian Process Optimization with Mutual Information , 2013, ICML.

[27] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[29] John W. Fisher,et al. Efficient Observation Selection in Probabilistic Graphical Models Using Bayesian Lower Bounds , 2016, UAI.

[30] Gediminas Adomavicius,et al. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31] Howie Choset,et al. Expensive Function Optimization with Stochastic Binary Outcomes , 2013, ICML.

[32] Alan S. Willsky,et al. A Krylov Subspace Method for Covariance Approximation and Simulation of Random Processes and Fields , 2003, Multidimens. Syst. Signal Process..

[33] Yijun Huang,et al. Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem , 2017, ArXiv.

[34] Hinrich Schütze,et al. Scoring , term weighting and thevector space model , 2015 .

[35] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[36] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.

[37] Martin J. Wainwright,et al. Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[38] Matthew W. Hoffman,et al. Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[39] Carl E. Rasmussen,et al. Bayesian Monte Carlo , 2002, NIPS.

[40] M E J Newman,et al. Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[41] Sebastian Pokutta,et al. Info-Greedy Sequential Adaptive Compressed Sensing , 2014, IEEE Journal of Selected Topics in Signal Processing.

[42] Roman Garnett,et al. Active search on graphs , 2013, KDD.

[43] Jiawei Han,et al. A Variance Minimization Criterion to Active Learning on Graphs , 2012, AISTATS.

[44] Michael A. Osborne,et al. Gaussian Processes for Global Optimization , 2008 .

[45] S. Frick,et al. Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[46] S. Friedland,et al. Submodular spectral functions of principal submatrices of a hermitian matrix, extensions and applications , 2010, 1007.3478.

[47] Andrew Gordon Wilson,et al. Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[48] John W. Fisher,et al. Boosting crowdsourcing with expert labels: Local vs. global effects , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[49] Danny C. Sorensen,et al. Deflation Techniques for an Implicitly Restarted Arnoldi Iteration , 1996, SIAM J. Matrix Anal. Appl..

[50] Roman Garnett,et al. Active Search for Sparse Signals with Region Sensing , 2016, AAAI.

[51] Klaus Obermayer,et al. Gaussian process regression: active data selection and test point rejection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[52] Philipp Hennig,et al. Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[53] Kirthevasan Kandasamy,et al. High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[54] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.

[55] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[56] Jonas Mockus,et al. On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[57] Alexander J. Smola,et al. Kernels and Regularization on Graphs , 2003, COLT.

[58] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[59] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[60] Peter I. Frazier,et al. Bayesian Multiple Target Localization , 2015, ICML.

[61] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[62] Kian Hsiang Low,et al. Decentralized active robotic exploration and mapping for probabilistic field classification in environmental sensing , 2012, AAMAS.

[63] Liang Xiong,et al. Kernels on Sample Sets via Nonparametric Divergence Estimates , 2012, 1202.0302.

[64] Tor Lattimore,et al. The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits , 2016, AISTATS.

[65] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.

[66] Emmanuel J. Candès,et al. On the Fundamental Limits of Adaptive Sensing , 2011, IEEE Transactions on Information Theory.

[67] Matthew Malloy,et al. Near-Optimal Adaptive Compressed Sensing , 2012, IEEE Transactions on Information Theory.

[68] Abhimanyu Das,et al. Algorithms for subset selection in linear regression , 2008, STOC.

[69] Robert D. Nowak,et al. Graph-based active learning: A new look at expected error minimization , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[70] Roman Garnett,et al. Bayesian Optimal Active Search and Surveying , 2012, ICML.

[71] Jo Eidsvik,et al. Norges Teknisk-naturvitenskapelige Universitet Iterative Numerical Methods for Sampling from High Dimensional Gaussian Distributions Iterative Numerical Methods for Sampling from High Dimensional Gaussian Distributions , 2022 .

[72] W. M. Wood-Vasey,et al. SDSS-III: MASSIVE SPECTROSCOPIC SURVEYS OF THE DISTANT UNIVERSE, THE MILKY WAY, AND EXTRA-SOLAR PLANETARY SYSTEMS , 2011, 1101.1529.

[73] Colin Fox,et al. Sampling Gaussian Distributions in Krylov Spaces with Conjugate Gradients , 2012, SIAM J. Sci. Comput..

[74] Kevin W. Boyack,et al. OpenOrd: an open-source toolbox for large graph layout , 2011, Electronic Imaging.

[75] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..

[76] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.

[77] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[78] A. Pettitt,et al. Fast sampling from a Gaussian Markov random ﬁeld using Krylov subspace approaches , 2008 .

[79] Rémi Munos,et al. Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[80] Jeff G. Schneider,et al. Active Search and Bandits on Graphs using Sigma-Optimality , 2015, UAI.

[81] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[82] José M. F. Moura,et al. Signal denoising on graphs via graph filtering , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[83] Antonio Ortega,et al. A probabilistic interpretation of sampling theory of graph signals , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[84] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[85] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[86] Roman Garnett,et al. Active Area Search via Bayesian Quadrature , 2014, AISTATS.

[87] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[88] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .

[89] Roman Garnett,et al. Σ-Optimality for Active Learning on Gaussian Random Fields , 2013, NIPS.

[90] Alkis Gotovos,et al. Active Learning for Level Set Estimation , 2022 .

[91] Benjamin M. Marlin,et al. A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification , 2016, NIPS.

[92] D. Dennis,et al. SDO : A Statistical Method for Global Optimization , 1997 .

[93] Alexander J. Smola,et al. Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods , 2015, ICML.

[94] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.