论文信息 - Cross-entropic learning of a machine for the decision in a partially observable universe

Cross-entropic learning of a machine for the decision in a partially observable universe

In this paper, we are interested in optimal decisions in a partially observable universe. Our approach is to directly approximate an optimal strategic tree depending on the observation. This approximation is made by means of a parameterized probabilistic law. A particular family of Hidden Markov Models (HMM), with input and output, is considered as a model of policy. A method for optimizing the parameters of these HMMs is proposed and applied. This optimization is based on the cross-entropic (CE) principle for rare events simulation developed by Rubinstein.

Frédéric Dambreville | F. Dambreville

[1] E. W. Adams,et al. The logic of conditionals , 1975 .

[2] Frédéric Dambreville. Definition of a Deterministic Bayesian Logic , 2004, ArXiv.

[3] G. Boole. An Investigation of the Laws of Thought: On which are founded the mathematical theories of logic and probabilities , 2007 .

[4] B. Dahn. Foundations of Probability theory, statistical inference, and statistical theories of science , 1978 .

[5] Glenn Shafer,et al. A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[6] Judea Pearl,et al. Probabilistic reasoning in intelligent systems , 1988 .

[7] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[8] O. Colot,et al. Identical foundation of probability theory and fuzzy set theory , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[9] Kevin P. Murphy,et al. Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[10] Sridhar Mahadevan,et al. Hierarchical learning and planning in partially observable markov decision processes , 2002 .

[11] Reuven Y. Rubinstein,et al. Rare event estimation for static models via cross-entropy and importance sampling , 2003 .

[12] Rachel Anne Bourne,et al. Default reasoning using maximum entropy and variable strength defaults , 1999 .

[13] Alan Hájek,et al. What Conditional Probability Could Not Be , 2003, Synthese.

[14] J. Heijenoort. From Frege to Gödel: A Source Book in Mathematical Logic, 1879-1931 , 1967 .

[15] Neil J. Gordon,et al. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[16] Laura Giordano,et al. Conditional logic of actions and causation , 2004, Artif. Intell..

[17] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[18] T. Brehard,et al. Hierarchical particle filter for bearings-only tracking , 2007, IEEE Transactions on Aerospace and Electronic Systems.

[19] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[20] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .