Necessary and sufficient Karush-Kuhn-Tucker conditions for multiobjective Markov chains optimality

The solution concepts proposed in this paper follow the Karush-Kuhn-Tucker (KKT) conditions for a Pareto optimal solution in finite-time, ergodic and controllable Markov chains multi-objective programming problems. In order to solve the problem we introduce the Tikhonov's regularizator for ensuring the objective function is strict-convex. Then, we consider the c -variable method for introducing equality constraints that guarantee the result belongs to the simplex and satisfies ergodicity constraints. Lastly, we restrict the cost-functions allowing points in the Pareto front to have a small distance from one another. The computed image points give a continuous approximation of the whole Pareto surface. The constraints imposed by the c -variable method make the problem computationally tractable and, the restriction imposed by the small distance change ensures the continuation of the Pareto front. We transform the multi-objective nonlinear problem into an equivalent nonlinear programming problem by introducing the Lagrange function multipliers. As a result we obtain that the objective function is strict-convex, the inequality constraints are continuously differentiable and the equality constraint is an affine function. Under these settings, the KKT optimality necessary and sufficient conditions are elicited naturally. A numerical example is solved for providing the basic techniques to compute the Pareto optimal solutions by resorting to KKT conditions.

[1]  Lothar Thiele,et al.  Quality Assessment of Pareto Set Approximations , 2008, Multiobjective Optimization.

[2]  Marcello Restelli,et al.  Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation , 2014, AAAI.

[3]  Yacov Y. Haimes,et al.  Kuhn-Tucker multipliers as trade-offs in multiobjective decision-making analysis , 1979, Autom..

[4]  Margaret M. Wiecek,et al.  Generating epsilon-efficient solutions in multiobjective programming , 2007, Eur. J. Oper. Res..

[5]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[6]  Shimon Whiteson,et al.  Bounded Approximations for Linear Multi-Objective Planning Under Uncertainty , 2014, ICAPS.

[7]  Joydeep Dutta,et al.  A new scalarization and numerical method for constructing the weak Pareto front of multi-objective optimization problems , 2011 .

[8]  Jerzy A. Filar,et al.  Multiobjective Markov decision process with average reward criterion , 1986 .

[9]  Ulf Schlichtmann,et al.  A Successive Approach to Compute the Bounded Pareto Front of Practical Multiobjective Optimization Problems , 2009, SIAM J. Optim..

[10]  Alexander S. Poznyak,et al.  Computing the strong Nash equilibrium for Markov chains games , 2015, Appl. Math. Comput..

[11]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[12]  Alexander S. Poznyak,et al.  A priori-knowledge/actor-critic reinforcement learning architecture for computing the mean-variance customer portfolio: The case of bank marketing campaigns , 2015, Eng. Appl. Artif. Intell..

[13]  Tuomas Sandholm,et al.  Algorithms for Strong Nash Equilibrium with More than Two Agents , 2013, AAAI.

[14]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[15]  Alexander S. Poznyak,et al.  Solving the mean-variance customer portfolio in Markov chains using iterated quadratic/Lagrange programming: A credit-card customer limits approach , 2015, Expert Syst. Appl..

[16]  Andrei V. Kelarev,et al.  Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.

[17]  Alexander S. Poznyak,et al.  Solving the Pareto front for multiobjective Markov chains using the minimum Euclidean distance gradient-based optimization method , 2016, Math. Comput. Simul..