Sequential decision making in general state space models

Many problems in control and signal processing can be formulated as sequential decision problems for general state space models. However, except for some simple models one cannot obtain analytical solutions and has to resort to approximation. In this thesis, we have investigated problems where Sequential Monte Carlo (SMC) methods can be combined with a gradient based search to provide solutions to online optimisation problems. We summarise the main contributions of the thesis as follows. Chapter 4 focuses on solving the sensor scheduling problem when cast as a controlled Hidden Markov Model. We consider the case in which the state, observation and action spaces are continuous. This general case is important as it is the natural framework for many applications. In sensor scheduling, our aim is to minimise the variance of the estimation error of the hidden state with respect to the action sequence. We present a novel SMC method that uses a stochastic gradient algorithm to find optimal actions. This is in contrast to existing works in the literature that only solve approximations to the original problem. In Chapter 5 we presented how an SMC can be used to solve a risk sensitive control problem. We adopt the use of the Feynman-Kac representation of a controlled Markov chain flow and exploit the properties of the logarithmic Lyapunov exponent, which lead to a policy gradient solution for the parameterised problem. The resulting SMC algorithm follows a similar structure with the Recursive Maximum Likelihood(RML) algorithm for online parameter estimation. In Chapters 6, 7 and 8, dynamic Graphical models were combined with with state space models for the purpose of online decentralised inference. We have concentrated more on the distributed parameter estimation problem using two Maximum Likelihood techniques, namely Recursive Maximum Likelihood (RML) and Expectation Maximization (EM). The resulting algorithms can be interpreted as an extension of the Belief Propagation (BP) algorithm to compute likelihood gradients. In order to design an SMC algorithm, in Chapter 8 uses a nonparametric approximations for Belief Propagation. The algorithms were successfully applied to solve the sensor localisation problem for sensor networks of small and medium size.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[3]  J. Peschon,et al.  Optimal control of measurement subsystems , 1967, IEEE Transactions on Automatic Control.

[4]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[5]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[6]  V. Aidala,et al.  Observability Criteria for Bearings-Only Target Motion Analysis , 1981, IEEE Transactions on Aerospace and Electronic Systems.

[7]  James N. Eagle The Optimal Search for a Moving Target When the Search Path Is Constrained , 1984, Oper. Res..

[8]  Peter W. Glynn,et al.  Proceedings of Ihe 1986 Winter Simulation , 2022 .

[9]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[10]  O. Hernondex-lerma,et al.  Adaptive Markov Control Processes , 1989 .

[11]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  P. Whittle Risk-Sensitive Optimal Control , 1990 .

[14]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[15]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[16]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[17]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[18]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[19]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[20]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[21]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[22]  Armand M. Makowski,et al.  On Stochastic Approximations Driven by Sample Averages: Convergence Results via the ODE Method , 1994 .

[23]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[24]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[25]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[26]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[27]  D. W. McMichael,et al.  Maximum likelihood registration of dissimilar sensors , 1996, Proceeding of 1st Australian Data Fusion Symposium.

[28]  S. Marcus,et al.  Risk sensitive control of Markov processes in countable state space , 1996 .

[29]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[30]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[31]  B. Delyon General results on the convergence of stochastic algorithms , 1996, IEEE Trans. Autom. Control..

[32]  Jun S. Liu,et al.  Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..

[33]  W. Fleming,et al.  Risk sensitive control of finite state machines on an infinite horizon. I , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[34]  R. Atar,et al.  Exponential stability for nonlinear filtering , 1997 .

[35]  Robin J. Evans,et al.  An information theoretic approach to observer path design for bearings-only tracking , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[36]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[37]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[38]  N. G. Best,et al.  Dynamic conditional independence models and Markov chain Monte Carlo methods , 1997 .

[39]  Carlos H. Muravchik,et al.  Posterior Cramer-Rao bounds for discrete-time nonlinear filtering , 1998, IEEE Trans. Signal Process..

[40]  Simon J. Godsill,et al.  On sequential simulation-based methods for Bayesian filtering , 1998 .

[41]  A Orman,et al.  Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..

[42]  J. Cadre,et al.  Optimal observer trajectory in bearings-only tracking for manoeuvring sources , 1999 .

[43]  S. P. Meynz,et al.  Risk Sensitive Optimal Control: Existence and Synthesis for Models with Unbounded Cost , 1999 .

[44]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[45]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[46]  L. Stettner,et al.  Risk sensitive control of discrete time partially observed Markov processes with infinite horizon , 1999 .

[47]  Alf Isaksson,et al.  On sensor scheduling via information theoretic criteria , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[48]  J. Spall,et al.  Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers , 1999 .

[49]  Lukasz Stettner,et al.  Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite Horizon , 1999, SIAM J. Control. Optim..

[50]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[51]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[52]  V. Tadic Asymptotic analysis of stochastic approximation algorithms under violated Kushner-Clark conditions with applications , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[53]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[54]  B. Delyon Stochastic Approximation with Decreasing Gain : Convergence and Asymptotic Theory , 2000 .

[55]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[56]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[57]  Laurent Mevel,et al.  Exponential Forgetting and Geometric Ergodicity in Hidden Markov Models , 2000, Math. Control. Signals Syst..

[58]  James C. Spall,et al.  Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[59]  John N. Tsitsiklis,et al.  Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[60]  S. Balajiy,et al.  Multiplicative Ergodicity and Large Deviations for an Irreducible Markov Chain , 2000 .

[61]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[62]  Laurent Mevel,et al.  Bayesian estimation of Hidden Markov Models , 2000 .

[63]  Simon J. Godsill,et al.  Improvement Strategies for Monte Carlo Particle Filters , 2001, Sequential Monte Carlo Methods in Practice.

[64]  Rong Chen,et al.  A Theoretical Framework for Sequential Importance Sampling with Resampling , 2001, Sequential Monte Carlo Methods in Practice.

[65]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[66]  Christophe Andrieu,et al.  An Introduction to Monte Carlo Methods for Bayesian Data Analysis , 2001 .

[67]  William J. Fitzgerald,et al.  Markov chain Monte Carlo methods with applications to signal processing , 2001, Signal Process..

[68]  B. R. Badrinath,et al.  Ad hoc positioning system (APS) , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[69]  Arnaud Doucet,et al.  Particle filters for state estimation of jump Markov linear systems , 2001, IEEE Trans. Signal Process..

[70]  Michael A. West,et al.  Combined Parameter and State Estimation in Simulation-Based Filtering , 2001, Sequential Monte Carlo Methods in Practice.

[71]  N. Gordon,et al.  Optimal Estimation and Cramér-Rao Bounds for Partial Non-Gaussian State Space Models , 2001 .

[72]  R. Douc,et al.  Asymptotics of the maximum likelihood estimator for general hidden Markov models , 2001 .

[73]  L. El Ghaoui,et al.  Convex position estimation in wireless sensor networks , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[74]  Vikram Krishnamurthy,et al.  Sensor Adaptive Target Tracking over Variable Bandwidth Networks , 2001 .

[75]  Neil J. Gordon,et al.  Particles and Mixtures for Tracking and Guidance , 2001, Sequential Monte Carlo Methods in Practice.

[76]  Christian Musso,et al.  Improving Regularised Particle Filters , 2001, Sequential Monte Carlo Methods in Practice.

[77]  Andrew E. B. Lim,et al.  Sensor scheduling in continuous time , 2001, Autom..

[78]  Christophe Andrieu,et al.  Sequential Monte Carlo Methods for Optimal Filtering , 2001, Sequential Monte Carlo Methods in Practice.

[79]  P. Glynn,et al.  Some New Perspectives on the Method of Control Variates , 2002 .

[80]  Sekhar Tatikonda,et al.  Loopy Belief Propogation and Gibbs Measures , 2002, UAI.

[81]  J. Cadre,et al.  Planification for Terrain- Aided Navigation , 2002 .

[82]  Geir Storvik,et al.  Particle filters for state-space models with the presence of unknown static parameters , 2002, IEEE Trans. Signal Process..

[83]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[84]  Mani B. Srivastava,et al.  The bits and flops of the n-hop multilateration primitive for node localization problems , 2002, WSNA '02.

[85]  P. Fearnhead MCMC, sufficient statistics and particle filters. , 2002 .

[86]  A. Doucet,et al.  Particle filtering for partially observed Gaussian state space models , 2002 .

[87]  Jan M. Rabaey,et al.  Robust Positioning Algorithms for Distributed Ad-Hoc Wireless Sensor Networks , 2002, USENIX Annual Technical Conference, General Track.

[88]  Feng Zhao,et al.  Information-Driven Dynamic Sensor Collaboration for Tracking Applications , 2002 .

[89]  Ian F. Akyildiz,et al.  Sensor Networks , 2002, Encyclopedia of GIS.

[90]  F. Gland,et al.  STABILITY AND UNIFORM APPROXIMATION OF NONLINEAR FILTERS USING THE HILBERT METRIC AND APPLICATION TO PARTICLE FILTERS1 , 2004 .

[91]  A. Doucet,et al.  On-line optimization of sequential Monte Carlo methods using stochastic approximation , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[92]  William T. Freeman,et al.  Nonparametric belief propagation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[93]  Christophe Andrieu,et al.  Online expectation-maximization type algorithms for parameter estimation in general state space models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[94]  Koen Langendoen,et al.  Distributed localization in wireless sensor networks: a quantitative compariso , 2003, Comput. Networks.

[95]  John N. Tsitsiklis,et al.  Linear stochastic approximation driven by slowly varying Markov chains , 2003, Syst. Control. Lett..

[96]  Randolph L. Moses,et al.  A Self-Localization Method for Wireless Sensor Networks , 2003, EURASIP J. Adv. Signal Process..

[97]  Robert D. Nowak,et al.  Distributed EM algorithms for density estimation and clustering in sensor networks , 2003, IEEE Trans. Signal Process..

[98]  François Le Gland,et al.  A particle implementation of the recursive MLE for partially observed diffusions , 2003 .

[99]  Michael Isard,et al.  PAMPAS: real-valued graphical models for computer vision , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[100]  William T. Freeman,et al.  Efficient Multiscale Sampling from Products of Gaussian Mixtures , 2003, NIPS.

[101]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[102]  Robin J. Evans,et al.  Stochastic approximation for optimal observer trajectory planning , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[103]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[104]  A. Doucet,et al.  Parameter estimation in general state-space models using particle methods , 2003 .

[105]  S. Meyn,et al.  Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.

[106]  Pierre Del Moral,et al.  Particle approximations of Lyapunov exponents connected to Schrödinger operators and Feynman–Kac semigroups , 2003 .

[107]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[108]  Y. Bar-Shalom,et al.  Multisensor resource deployment using posterior Cramer-Rao bounds , 2004, IEEE Transactions on Aerospace and Electronic Systems.

[109]  S. Pattem,et al.  Distributed online localization in sensor networks using a moving target , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[110]  A. Doucet,et al.  Particle Motions in Absorbing Medium with Hard and Soft Obstacles , 2004 .

[111]  David C. Moore,et al.  Robust distributed network localization with noisy range measurements , 2004, SenSys '04.

[112]  Robin J. Evans,et al.  Variance reduction for Monte Carlo implementation of adaptive sensor management , 2004 .

[113]  A. Doucet,et al.  Monte Carlo Smoothing for Nonlinear Time Series , 2004, Journal of the American Statistical Association.

[114]  J. Tsitsiklis,et al.  Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.

[115]  Marcel L. Hernandez,et al.  Optimal Sensor Trajectories in Bearings-Only Tracking , 2004 .

[116]  Mihail L. Sichitiu,et al.  Localization of wireless sensor networks with a mobile beacon , 2004, 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems (IEEE Cat. No.04EX975).

[117]  R. Laubenfels,et al.  Feynman–Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2005 .

[118]  T. Brehard,et al.  Distributed target tracking for nonlinear systems: application to bearings-only tracking , 2005, 2005 7th International Conference on Information Fusion.

[119]  Richard M. Murray,et al.  DISTRIBUTED SENSOR FUSION USING DYNAMIC CONSENSUS , 2005 .

[120]  Erik D. Demaine,et al.  Mobile-assisted localization in wireless sensor networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[121]  Stephen P. Boyd,et al.  A scheme for robust distributed sensor fusion based on average consensus , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[122]  R.L. Moses,et al.  Locating the nodes: cooperative localization in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[123]  Arnaud Doucet,et al.  Particle methods for optimal filter derivative: application to parameter estimation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[124]  Dragan Petrovic,et al.  Information-directed routing in ad hoc sensor networks , 2003, IEEE Journal on Selected Areas in Communications.

[125]  A. Doucet,et al.  Exponential forgetting and geometric ergodicity for optimal filtering in general state-space models , 2005 .

[126]  John W. Fisher,et al.  Nonparametric belief propagation for self-localization of sensor networks , 2005, IEEE Journal on Selected Areas in Communications.

[127]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[128]  R. Olfati-Saber,et al.  Distributed Kalman Filter with Embedded Consensus Filters , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[129]  Meng-Chang Lee Top 100 Documents Browse Search Ieee Xplore Guide Support Top 100 Documents Accessed: Nov 2005 a Tutorial on Hidden Markov Models and Selected Applications Inspeech Recognition , 2005 .

[130]  Carlos Guestrin,et al.  A robust architecture for distributed inference in sensor networks , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[131]  Martin J. Wainwright,et al.  A variational principle for graphical models , 2005 .

[132]  L. Gerencsér,et al.  Recursive estimation of Hidden Markov Models , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[133]  Eric Moulines,et al.  Comparison of resampling schemes for particle filtering , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[134]  J. Vermaak,et al.  Online sensor registration , 2005, 2005 IEEE Aerospace Conference.

[135]  D. Hernández-Hernández,et al.  A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains , 2005, math/0503478.

[136]  Arnaud Doucet,et al.  Particle Filter as A Controlled Markov Chain For On-Line Parameter Estimation in General State Space Models , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[137]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[138]  C. Guestrin,et al.  Distributed localization of networked cameras , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[139]  Benjamin Van Roy,et al.  Consensus Propagation , 2005, IEEE Transactions on Information Theory.

[140]  A. Rahimi,et al.  Simultaneous localization, calibration, and tracking in an ad hoc sensor network , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[141]  Christopher Taylor,et al.  Simultaneous localization, calibration, and tracking in an ad hoc sensor network , 2006, IPSN.

[142]  P. Moral,et al.  Branching and interacting particle interpretations of rare event probabilities , 2006 .

[143]  Jian Tan,et al.  Localization for Anchoritic Sensor Networks , 2006, DCOSS.

[144]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[145]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.