Multi-Armed Bandit Problems

Multi-armed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield high current rewards, versus making decisions that sacrifice current gains with the prospect of better future rewards. The MAB formulation models resource allocation problems arising in several technological and scientific disciplines such as sensor management, manufacturing systems, economics, queueing and communication networks, clinical trials, control theory, search theory, etc. (see [88] and references therein).

[1]  J. I The Design of Experiments , 1936, Nature.

[2]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R. Bellman A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[4]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[5]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[6]  Michael Horstein,et al.  Sequential transmission using noiseless feedback , 1963, IEEE Trans. Inf. Theory.

[7]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[8]  D. Blackwell Discounted Dynamic Programming , 1965 .

[9]  C. Striebel Sufficient statistics in the optimum control of stochastic systems , 1965 .

[10]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[11]  Walter T. Federer,et al.  Sequential Design of Experiments , 1967 .

[12]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[13]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[14]  Martin J. Beckmann Dynamic programming of economic decisions , 1969 .

[15]  M. Degroot Optimal Statistical Decisions , 1970 .

[16]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[17]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[18]  D. Sworder,et al.  Introduction to stochastic control , 1972 .

[19]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[20]  G. Simons Great Expectations: Theory of Optimal Stopping , 1973 .

[21]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[22]  Erhan Çinlar,et al.  Introduction to stochastic processes , 1974 .

[23]  Antoine-S Bailly,et al.  Science régionale - Walter Isard, Introduction to régional science. Englewood Cliffs (NJ), Prentice-Hall, 1975 , 1975 .

[24]  Alʹbert Nikolaevich Shiri︠a︡ev,et al.  Optimal stopping rules , 1977 .

[25]  Robert E. Larson,et al.  Principles of Dynamic Programming , 1978 .

[26]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[27]  Martin L. Puterman,et al.  Dynamic Programming and Its Application , 1979 .

[28]  M. Skolnik,et al.  Introduction to Radar Systems , 2021, Advances in Adaptive Radar Detection and Range Estimation.

[29]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[30]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[31]  E. Angel,et al.  Principles of dynamic programming part 1 , 1980, Proceedings of the IEEE.

[32]  F. Kelly Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .

[33]  P. Whittle Arm-Acquiring Bandits , 1981 .

[34]  R. Hartley,et al.  Optimisation Over Time: Dynamic Programming and Stochastic Control: , 1983 .

[35]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[36]  Y. Bar-Shalom,et al.  Detection thresholds for tracking in clutter--A connection between estimation and signal processing , 1985 .

[37]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[38]  J. Tsitsiklis A lemma on the multiarmed bandit problem , 1986 .

[39]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[40]  Samuel S. Blackman,et al.  Multiple-Target Tracking with Radar Applications , 1986 .

[41]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[42]  H. Chernoff Sequential Analysis and Optimal Design , 1987 .

[43]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[44]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[45]  A. Mandelbaum CONTINUOUS MULTI-ARMED BANDITS AND MULTIPARAMETER PROCESSES , 1987 .

[46]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[47]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[48]  D. Teneketzis,et al.  Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .

[49]  Yaakov Bar-Shalom,et al.  Multitarget-multisensor tracking: Advanced applications , 1989 .

[50]  R. Agrawal,et al.  Certainty equivalence control with forcing: revisited , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[51]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[52]  R. Agrawal,et al.  Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .

[53]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[54]  R. Agrawal,et al.  Multi-armed bandit problems with multiple plays and switching cost , 1990 .

[55]  Bert-Eric Tullsson Monopulse tracking of Rayleigh targets: a simple approach , 1991 .

[56]  Kenneth J. Hintz,et al.  A measure of the information gain attributable to cueing , 1991, IEEE Trans. Syst. Man Cybern..

[57]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[58]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[59]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[60]  Eugene S. McVey,et al.  Multi-process constrained estimation , 1991, IEEE Trans. Syst. Man Cybern..

[61]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[62]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[63]  D. Teneketzis,et al.  Optimality of index policies for stochastic scheduling with switching penalties , 1992, Journal of Applied Probability.

[64]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[65]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[66]  Dimitris Bertsimas,et al.  Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems , 2011, IPCO.

[67]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[68]  D. Teneketzis,et al.  Optimal stochastic scheduling of forest networks with switching penalties , 1994, Advances in Applied Probability.

[69]  S. Musick,et al.  Chasing the elusive sensor manager , 1994, Proceedings of National Aerospace and Electronics Conference (NAECON'94).

[70]  Robin J. Evans,et al.  Optimal waveform selection for tracking systems , 1994, IEEE Trans. Inf. Theory.

[71]  J. Tsitsiklis,et al.  Branching bandits and Klimov's problem: achievable region and side constraints , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[72]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[73]  Michael Jackson,et al.  Optimal Design of Experiments , 1994 .

[74]  J. Banks,et al.  Switching Costs and the Gittins Index , 1994 .

[75]  Partha Niyogi,et al.  Active Learning for Function Approximation , 1994, NIPS.

[76]  Keith D. Kastella,et al.  Event-averaged maximum likelihood estimation and information-based sensor management , 1994, Defense, Security, and Sensing.

[77]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[78]  P. Varaiya,et al.  Multi-Armed bandit problem revisited , 1994 .

[79]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[80]  I. Karatzas,et al.  Dynamic Allocation Problems in Continuous Time , 1994 .

[81]  M. Littman The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[82]  Robin J. Evans,et al.  Integrated probabilistic data association , 1994, IEEE Trans. Autom. Control..

[83]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[84]  I. J. Taneja New Developments in Generalized Information Measures , 1995 .

[85]  Michael I. Miller,et al.  Conditional-mean estimation via jump-diffusion processes in multiple target tracking/recognition , 1995, IEEE Trans. Signal Process..

[86]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[87]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[88]  John Rust Numerical dynamic programming in economics , 1996 .

[89]  Demosthenis Teneketzis,et al.  Multi-armed bandits with switching penalties , 1996, IEEE Trans. Autom. Control..

[90]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[91]  Dimitris Bertsimas,et al.  Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems , 1996, Math. Oper. Res..

[92]  M. Katehakis,et al.  Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality , 1996 .

[93]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[94]  Lawrence Carin,et al.  Matching pursuits with a wave-based dictionary , 1997, IEEE Trans. Signal Process..

[95]  R.J. Evans,et al.  Waveform selective probabilistic data association , 1997, IEEE Transactions on Aerospace and Electronic Systems.

[96]  Keith Kastella Discrimination gain to optimize detection and classification , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[97]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[98]  I. J. Won,et al.  GEM‐3: A Monostatic Broadband Electromagnetic Induction Sensor , 1997 .

[99]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[100]  D. Castañón Approximate dynamic programming for sensor management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[101]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[102]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[103]  D.A. Castanon,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[104]  Douglas Cochran,et al.  Dynamic estimation with selectable linear measurements , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[105]  Quentin F. Stout,et al.  Flexible Algorithms for Creating and Analyzing Adaptive Sampling Procedures , 1998 .

[106]  A. Doucet On sequential Monte Carlo methods for Bayesian filtering , 1998 .

[107]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[108]  W. Blair,et al.  Unresolved Rayleigh target detection using monopulse measurements , 1998 .

[109]  A. Mandelbaum,et al.  Multi-armed bandits in discrete and continuous time , 1998 .

[110]  Lawrence Carin,et al.  Multiaspect identification of submerged elastic targets via wave-based matching pursuits and hidden , 1999 .

[111]  Lawrence Carin,et al.  Hidden Markov models for multiaspect target classification , 1999, IEEE Trans. Signal Process..

[112]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[113]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[114]  Demosthenis Teneketzis,et al.  On the optimality of the Gittins index rule for multi-armed bandits with multiple plays , 1995, Math. Methods Oper. Res..

[115]  Lawrence D. Stone,et al.  Bayesian Multiple Target Tracking , 1999 .

[116]  R. Viswanathan,et al.  Performance of distributed CFAR test under various clutter amplitudes , 1999 .

[117]  Y. Bar-Shalom,et al.  From the waveform through the resolution cell to the tracker , 1999, 1999 IEEE Aerospace Conference. Proceedings (Cat. No.99TH8403).

[118]  Carl E. Baum,et al.  On the low-frequency natural response of conducting and permeable targets , 1999, IEEE Trans. Geosci. Remote. Sens..

[119]  A. Korostelev On minimax rates of convergence in image models under sequential design , 1999 .

[120]  Douglas Cochran,et al.  Source detection and localization using a multi-mode detector: a Bayesian approach , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[121]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[122]  Robert Givan,et al.  A framework for simulation-based network control via hindsight optimization , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[123]  José Niño Mora Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[124]  Demosthenis Teneketzis,et al.  ON THE OPTIMALITY OF AN INDEX RULE IN MULTICHANNEL ALLOCATION FOR SINGLE-HOP MOBILE NETWORKS WITH MULTIPLE SERVICE CLASSES , 2000 .

[125]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[126]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[127]  Yaakov Bar-Shalom,et al.  Multitarget/Multisensor Tracking: Applications and Advances -- Volume III , 2000 .

[128]  Dimitris Bertsimas,et al.  Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic , 2000, Oper. Res..

[129]  D. Cochran,et al.  Multi-mode detection with Markov target motion , 2000, Proceedings of the Third International Conference on Information Fusion.

[130]  A. Korostelev,et al.  Rates of convergence for the sup-norm risk in image models under sequential designs , 2000 .

[131]  Robin J. Evans,et al.  Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking , 2001, IEEE Trans. Signal Process..

[132]  Yacine Dalichaouch,et al.  On the wideband EMI response of a rotationally symmetric permeable and conducting target , 2001, IEEE Trans. Geosci. Remote. Sens..

[133]  Fredrik Gustafsson,et al.  Monte Carlo data association for multiple target tracking , 2001 .

[134]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2002, J. Mach. Learn. Res..

[135]  Aleksandar Dogandzic,et al.  Cramer-Rao bounds for estimating range, velocity, and direction with an active array , 2001, IEEE Trans. Signal Process..

[136]  Michael Isard,et al.  BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[137]  Nicola Secomandi,et al.  A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands , 2001, Oper. Res..

[138]  Eric Gottlieb,et al.  The Umbra Simulation Framework , 2001 .

[139]  J. D. Gorman,et al.  Alpha-Divergence for Classification, Indexing and Retrieval (Revised 2) , 2002 .

[140]  R. B. Washburn,et al.  Stochastic dynamic programming based approaches to sensor resource management , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[141]  K. Glazebrook,et al.  Index policies for a class of discounted restless bandits , 2002, Advances in Applied Probability.

[142]  Patrick Pérez,et al.  Sequential Monte Carlo methods for multiple target tracking and data fusion , 2002, IEEE Trans. Signal Process..

[143]  William Fitzgerald,et al.  A Bayesian approach to tracking multiple targets using sensor arrays and particle filters , 2002, IEEE Trans. Signal Process..

[144]  P. Pérez,et al.  Tracking multiple objects with particle filtering , 2002 .

[145]  José Niño-Mora,et al.  Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach , 2002, Math. Program..

[146]  Feng Zhao,et al.  Information-driven dynamic sensor collaboration , 2002, IEEE Signal Process. Mag..

[147]  D.A. Castanon,et al.  Model predictive control for dynamic unreliable resource allocation , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[148]  Raymond W. Yeung,et al.  A First Course in Information Theory , 2002 .

[149]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[150]  Vikram Krishnamurthy,et al.  Algorithms for optimal scheduling and management of hidden Markov model sensors , 2002, IEEE Trans. Signal Process..

[151]  L. Shepp Probability Essentials , 2002 .

[152]  Gang Wu,et al.  Burst-level congestion control using hindsight optimization , 2002, IEEE Trans. Autom. Control..

[153]  A. Doucet,et al.  Particle filtering for multi-target tracking and sensor management , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[154]  M. Veth,et al.  Affordable moving surface target engagement , 2002, Proceedings, IEEE Aerospace Conference.

[155]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..

[156]  Henk A. P. Blom,et al.  Joint IMMPDA particle filter , 2003, Sixth International Conference of Information Fusion, 2003. Proceedings of the.

[157]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[158]  Robin J. Evans,et al.  Correction to "Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking" , 2003, IEEE Trans. Signal Process..

[159]  Neil J. Gordon,et al.  Efficient particle filtering for multiple target tracking with application to tracking in structured images , 2003, Image Vis. Comput..

[160]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[161]  Kevin D. Glazebrook,et al.  Whittle's index policy for a multi-class queueing system with convex holding costs , 2003, Math. Methods Oper. Res..

[162]  D. Fox,et al.  People Tracking with Anonymous and ID-Sensors Using Rao-Blackwellised Particle Filters , 2003, IJCAI.

[163]  Eric V. Denardo,et al.  Dynamic Programming: Models and Applications , 2003 .

[164]  Leslie M. Collins,et al.  Sensing of unexploded ordnance with magnetometer and induction data: theory and signal processing , 2003, IEEE Trans. Geosci. Remote. Sens..

[165]  Alfred O. Hero,et al.  Multi-target Sensor Management Using Alpha-Divergence Measures , 2003, IPSN.

[166]  P. Hall,et al.  Sequential methods for design-adaptive estimation of discontinuities in regression curves and surfaces , 2003 .

[167]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[168]  Nah-Oak Song,et al.  Discrete search with multiple sensors , 2004, Math. Methods Oper. Res..

[169]  R. Nowak,et al.  Backcasting: adaptive sampling for sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[170]  Y. Bar-Shalom,et al.  Multisensor resource deployment using posterior Cramer-Rao bounds , 2004, IEEE Transactions on Aerospace and Electronic Systems.

[171]  Rebecca Willett,et al.  Coarse-to-fine manifold learning , 2004 .

[172]  A. Hero,et al.  Efficient methods of non-myopic sensor management for multitarget tracking , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[173]  Robert Givan,et al.  Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes , 2004, Discret. Event Dyn. Syst..

[174]  Mingyan Liu,et al.  On the optimality of an index policy for bandwidth allocation with delayed state observation and differentiated services , 2004, IEEE INFOCOM 2004.

[175]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[176]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[177]  Robert D. Nowak,et al.  Coarse-to-fine manifold learning [image processing example] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[178]  Alfred O. Hero,et al.  Multiple Model Particle Filtering For Multi-Target Tracking , 2004 .

[179]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[180]  Robert R. Tenney,et al.  Dynamic tactical targeting , 2004, SPIE Defense + Commercial Sensing.

[181]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[182]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[183]  Mingyan Liu,et al.  Properties of optimal resource sharing in a delay channel , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[184]  Lawrence Carin,et al.  Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[185]  M.K. Schneider,et al.  Closing the loop in sensor fusion systems: stochastic dynamic programming approaches , 2004, Proceedings of the 2004 American Control Conference.

[186]  Alfred O. Hero,et al.  Information-based sensor management for multitarget tracking , 2004, SPIE Optics + Photonics.

[187]  A. Papandreou-Suppappola,et al.  Efficient search strategies for non-myopic sensor scheduling in target tracking , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[188]  R. Evans,et al.  Clutter map information for data association and track initialization , 2004, IEEE Transactions on Aerospace and Electronic Systems.

[189]  Urbashi Mitra,et al.  Estimating inhomogeneous fields using wireless sensor networks , 2004, IEEE Journal on Selected Areas in Communications.

[190]  S. Challa,et al.  Multi Target Tracking of Ground Targets in Clutter with LMIPDA-IMM , 2004 .

[191]  R. Nowak,et al.  Multiscale likelihood analysis and complexity penalized estimation , 2004, math/0406424.

[192]  Ying He,et al.  Sensor scheduling for target tracking in sensor networks , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[193]  Hui Li,et al.  An M-ary KMP classifier for multi-aspect target classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[194]  Dimitri P. Bertsekas,et al.  Discretized Approximations for POMDP with Average Cost , 2004, UAI.

[195]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[196]  P. Whittle Tax problems in the undiscounted case , 2005 .

[197]  D. Geman,et al.  Hierarchical testing designs for pattern recognition , 2005, math/0507421.

[198]  Alfred O. Hero,et al.  From Weighted Classification to Policy Search , 2005, NIPS.

[199]  K. Kastella,et al.  A Comparison of Task Driven and Information Driven Sensor Management for Target Tracking , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[200]  Ronald E. Parr,et al.  Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes , 2005 .

[201]  Douglas Cochran,et al.  Waveform-agile sensing: opportunities and challenges , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[202]  A. Hero,et al.  Multitarget tracking using the joint multitarget probability density , 2005, IEEE Transactions on Aerospace and Electronic Systems.

[203]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[204]  Darryl Morrell,et al.  Time-varying waveform selection and configuration for agile sensors in tracking applications , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[205]  Robert D. Nowak,et al.  Faster Rates in Regression via Active Learning , 2005, NIPS.

[206]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[207]  Alfred O. Hero,et al.  Sensor management using an active sensing approach , 2005, Signal Process..

[208]  Edwin K. P. Chong,et al.  Sensor scheduling for target tracking: A Monte Carlo sampling approach , 2006, Digit. Signal Process..

[209]  Jian Wang,et al.  Maximum Likelihood Estimation of Compound-Gaussian Clutter and Target Parameters , 2006, IEEE Transactions on Signal Processing.

[210]  A. Hero,et al.  Optimal Sensor Scheduling via Classification Reduction of Policy Search ( CROPS ) , 2006 .

[211]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[212]  Mingyan Liu,et al.  Optimal bandwidth allocation in a delay channel , 2006, IEEE Journal on Selected Areas in Communications.

[213]  Alfred O. Hero,et al.  Adaptive multi-modality sensor scheduling for detection and tracking of smart targets , 2006, Digit. Signal Process..

[214]  A. Singh,et al.  Active learning for adaptive mobile sensing networks , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[215]  R.J. Evans,et al.  Waveform Libraries for Radar Tracking Applications: Maneuvering Targets , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[216]  Alfred O. Hero,et al.  On Dimensionality Reduction for Classification and its Application , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[217]  A. Robert Calderbank,et al.  Adaptive Waveform Design for Improved Detection of Low-RCS Targets in Heavy Sea Clutter , 2007, IEEE Journal of Selected Topics in Signal Processing.

[218]  Lawrence Carin,et al.  Nonmyopic Multiaspect Sensing With Partially Observable Markov Decision Processes , 2007, IEEE Transactions on Signal Processing.

[219]  Mark R. Morelande,et al.  A Bayesian Approach to Multiple Target Detection and Tracking , 2007, IEEE Transactions on Signal Processing.

[220]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[221]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[222]  Alfred O. Hero,et al.  An Information-Based Approach to Sensor Management in Large Dynamic Networks , 2007, Proceedings of the IEEE.

[223]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[224]  Mingyan Liu,et al.  Server allocation with delayed state observation: Sufficient conditions for the optimality of an index policy , 2009, IEEE Transactions on Wireless Communications.

[225]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[226]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .