Reinforcement learning-based design of sampling policies under cost constraints in Markov random fields: Application to weed map reconstruction

Weeds are responsible for yield losses in arable fields, whereas the role of weeds in agro-ecosystem food webs and in providing ecological services has been well established. Innovative weed management policies have to be designed to handle this trade-off between production and regulation services. As a consequence, there has been a growing interest in the study of the spatial distribution of weeds in crops, as a prerequisite to management. Such studies are usually based on maps of weed species. The issues involved in building probabilistic models of spatial processes as well as plausible maps of the process on the basis of models and observed data are frequently encountered and important. As important is the question of designing optimal sampling policies that make it possible to build maps of high probability when the model is known. This optimization problem is more complex to solve than the pure reconstruction problem and cannot generally be solved exactly. A generic approach to spatial sampling for optimizing map construction, based on Markov Random Fields (MRF), is provided and applied to the problem of weed sampling for mapping. MRF offer a powerful representation for reasoning on large sets of random variables in interaction. In the field of spatial statistics, the design of sampling policies has been largely studied in the case of continuous variables, using tools from the geostatistics domain. In the MRF case with finite state space variables, some heuristics have been proposed for the design problem but no universally accepted solution exists, particularly when considering adaptive policies as opposed to static ones. The problem of designing an adaptive sampling policy in an MRF can be formalized as an optimization problem. By combining tools from the fields of Artificial Intelligence (AI) and Computational Statistics, an original algorithm is then proposed for approximate resolution. This generic procedure, referred to as Least-Squares Dynamic Programming (LSDP), combines an approximation of the value of a sampling policy based on a linear regression, the construction of a batch of MRF realizations and a backwards induction algorithm. Based on an empirical comparison of the performance of LSDP with existing one-step-look-ahead sampling heuristics and solutions provided by classical AI algorithms, the following conclusions can be derived: (i) a naive heuristic consisting of sampling sites where marginals are the most uncertain is already an efficient sampling approach; (ii) LSDP outperforms all the classical approaches we have tested; and (iii) LSDP outperforms the naive heuristic approach in cases where sampling costs are not uniform over the set of variables or where sampling actions are constrained.

[1]  Mathieu Bonneau,et al.  Reinforcement learning based design of sampling policies under cost constraints in Markov Random Fields , 2014 .

[2]  F. Y. Wu The Potts model , 1982 .

[3]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[4]  Andreas Krause,et al.  Optimal Value of Information in Graphical Models , 2009, J. Artif. Intell. Res..

[5]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[6]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[7]  Jo Eidsvik,et al.  Dynamic decision making for graphical models applied to oil exploration , 2012, Eur. J. Oper. Res..

[8]  E. Oerke Crop losses to pests , 2005, The Journal of Agricultural Science.

[9]  Mathieu Bonneau,et al.  A Reinforcement-Learning Algorithm for Sampling Design in Markov Random Fields , 2012, ECAI.

[10]  M. Biskup,et al.  Gibbs states of graphical representations of the Potts model with external fields , 2000 .

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Zhengyuan Zhu,et al.  Optimal predictive design augmentation for spatial generalised linear mixed models , 2012 .

[13]  Jane Memmott,et al.  Pollinator webs, plant communities and the conservation of rare plants: arable weeds as a case study , 2006 .

[14]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[15]  Louis Wehenkel,et al.  Optimal Sample Selection for Batch-mode Reinforcement Learning , 2011, ICAART.

[16]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  P. Renard,et al.  Probability Aggregation Methods in Geoscience , 2012, Mathematical Geosciences.

[18]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Brett Whelan,et al.  Sampling strategy is important for producing weed maps: a case study using kriging , 2002, Weed Science.

[21]  Daniel Spring,et al.  Model-based adaptive spatial sampling for occurrence map construction , 2013, Stat. Comput..

[22]  Nathalie Peyrard,et al.  Decision-theoretic Optimal Sampling in Hidden Markov Random Fields , 2010, ECAI.

[23]  D. J. Brus,et al.  Sampling for Natural Resource Monitoring , 2006 .

[24]  Giovanni Lafratta Efficiency evaluation of MEV spatial sampling strategies: a scenario analysis , 2006, Comput. Stat. Data Anal..

[25]  Lori J. Wiles,et al.  Sampling to make maps for site-specific weed management , 2005 .