General Framework for Reinforcement Learning

In this artide we p1'Opose a geneml framework for sequential dedsion making. The fi'amework is baSf.d on the observation that the. derication of the optimal behaviour ttnde.T various decision criteria follows the. same patte!"n: the cost of policies can be decomposed into the successive applicatlOn of an opemtor that defines the related dynamic programming algordhm and this ope.mtor descnbes complete.ly the structure of the decision problem. vVe take this mapping (the 80 called one step lookahead (OLA) cost mapping) as our starting point. This enables Ihe unified Irealmenl of various decision ,Tile ria (e.g. Ihe expected value (Tilerion OJ' Ihe worsl-case 'Tile1'ion). The main resuli 0./ Ihis arlicle says Ihal Wldt1" minimal condilions oplimal slalionm'Y policies are greedy "W.l·.I. Ihe oplimal cosl Junelion and vice versa. Based on Ihis n:suli "We ./eel Ihal former' resttlis on r'emforcemenl learning can be Imnsfel"1"td 10 olhel' decision (Tileria pl'ovided Ihal lhe decision criltTion is decomposable by an apPl"Opj'iale mapping.