The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning

Furthermore, if i and j are neighboring locations, then the correlation of observations at those points conditional on all other observations is Qij/ √ QiiQjj , and the conditional mean and precision of any single observation can also be expressed as simple functions of nonzero elements of Q. Since GMRF’s can be specified through Normal conditionals defined by sparse matrices, this allows certain computations to run much more quickly than they would in the case of methods based on the dense matrix . Throughout the book, Normal distributions are specified by precisions, rather than by variances. The authors assume that the reader has a good working knowledge of conditional probability and of the multivariate Normal distribution at the level of Hogg, McKean, and Craig (2005). Beyond this, they outline or discuss in detail the technical aspects of fitting GMRF’s, with a strong emphasis on only those details that are needed to make practical use of the methods. More formal and mathematical results are referenced in the endnotes. Their style is terse and if some detail seems to be missing, it can often be found upon rereading. If not, then the references and endnotes will point to where those details can be found. Theory is usually presented first, followed by a detailed investigation of one or more complex examples. After a brief and very useful introduction, Chapter 2 contains most of the general results about GMRF’s. It begins with a presentation of notation necessary to describe conditional distributions relative to graphical dependence networks. The use of the precision matrix in model specification, inference, and simulation is then discussed, and a section is dedicated to a general review of solvers for sparse systems of linear equations. Since the speed of precisionmatrix-based methods depends on the speed of these solvers, this discussion is central to the practical use of these methods, and any text on inference that requires these methods should include a section of this kind. The use of toroidal boundary conditions on the data is discussed, showing how they can lead to further speed increases through use of cyclic precision matrices. Finally, several methods for ensuring that Q is positive definite are compared. The three remaining chapters apply the basic techniques to increasingly complex models. Whereas the third chapter is accessible and self-contained, the authors admit that the last two chapters cover areas where research is ongoing and suggest that they are best used by those with prior knowledge of those areas. Chapter 3 presents GMRF’s in which Q is not of full rank. This can result from a linear constraint on conditional means, but also arises in cases where the data are first or second differences between observations at neighboring sites. For example, GMRF methods can be used to estimate parameters for random walks, based upon their paths. After fitting a random walk on the line, the authors go on to random walks on lattices, to random walks in continuous time observed at discrete random times, and to second order random walks. Very nice graphics are used to describe sums with a simple geometrical structure that would be hard to express with formal notation. The fourth chapter discusses hierarchical models, in which GMRF’s are used to model dependencies between parameters. As an example, one could observe a sequence Yi of Bernoulli(pi) random variables that are not independent. To model the dependence, one could assume that Xi = logit(Yi) arises from an autoregressive model and further assume that the variance of the random terms in that autoregressive process is random with some suitable distribution. Fitting a model of this kind requires use of Markov chain Monte Carlo (MCMC) techniques, which are briefly discussed. The focus of these discussions is not on convergence (which is referenced elsewhere), but to the blocking of parameters to increase the efficiency of the MCMC algorithms. Other examples include the fitting of a model in which monthly numbers of driver deaths on British roads are modeled as