论文信息 - Learning-Based Control Policy and Regret Analysis for Online Quadratic Optimization With Asymmetric Information Structure

Learning-Based Control Policy and Regret Analysis for Online Quadratic Optimization With Asymmetric Information Structure

In this paper, we propose a learning approach to analyze dynamic systems with asymmetric information structure. Instead of adopting a game theoretic setting, we investigate an online quadratic optimization problem driven by system noises with unknown statistics. Due to information asymmetry, it is infeasible to use classic Kalman filter nor optimal control strategies for such systems. It is necessary and beneficial to develop a robust approach that learns the probability statistics as time goes forward. Motivated by online convex optimization (OCO) theory, we introduce the notion of regret, which is defined as the cumulative performance loss difference between the optimal offline known statistics cost and the optimal online unknown statistics cost. By utilizing dynamic programming and linear minimum mean square biased estimate (LMMSUE), we propose a new type of online state feedback control policies and characterize the behavior of regret in finite time regime. The regret is shown to be sub-linear and bounded by O(ln T). Moreover, we address an online optimization problem with output feedback control policies.

Wing Shing Wong | Cheng Tan | W. Wong | Lin Yang | Cheng Tan

[1] H. Witsenhausen. A Counterexample in Stochastic Optimum Control , 1968 .

[2] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[3] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[4] P. Zeephongsekul,et al. Seller-buyer models of supply chain management with an asymmetric information structure , 2010 .

[5] Lin Yang,et al. Gittins index based control policy for a class of pursuit-evasion problems , 2018 .

[6] Zhen Wu,et al. Maximum principle for the stochastic optimal control problem with delay and application , 2010, Autom..

[7] K. Judd. The law of large numbers with a continuum of IID random variables , 1985 .

[8] Hon-Shiang Lau,et al. Some two-echelon supply-chain games: Improving from deterministic-symmetric-information to stochastic-asymmetric-information models , 2005, Eur. J. Oper. Res..

[9] Huanshui Zhang,et al. Delay-Dependent Algebraic Riccati Equation to Stabilization of Networked Control Systems: Continuous-Time Case. , 2018, IEEE transactions on cybernetics.

[10] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[12] Yi Ouyang,et al. Dynamic Games With Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition , 2015, IEEE Transactions on Automatic Control.

[13] Pierre Cardaliaguet,et al. Differential Games with Asymmetric Information , 2007, SIAM J. Control. Optim..

[14] Panlop Zeephongsekul,et al. A game theory approach in seller-buyer supply chain , 2009, Eur. J. Oper. Res..

[15] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[16] Shahin Shahrampour,et al. Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.

[17] Y. Ho,et al. Team decision theory and information structures in optimal control problems--Part II , 1972 .

[18] Y. Ho,et al. Differential games and optimal pursuit-evasion strategies , 1965 .

[19] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[20] Ichiro Suzuki,et al. Optimal Algorithms for a Pursuit-Evasion Problem in Grids , 1989, SIAM J. Discret. Math..

[21] George M. Siouris,et al. Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[22] Kuo-Ren Lou,et al. Nash and integrated solutions in a just-in-time seller–buyer supply chain with buyer's ordering cost reductions , 2016, Int. J. Syst. Sci..

[23] Giuseppe Carlo Calafiore,et al. Robust filtering for discrete-time systems with bounded noise and parametric uncertainty , 2001, IEEE Trans. Autom. Control..

[24] Petros G. Voulgaris,et al. On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[25] Wing Shing Wong,et al. Systems with finite communication bandwidth constraints. II. Stabilization with limited information feedback , 1999, IEEE Trans. Autom. Control..

[26] Stephen P. Boyd,et al. Fast Model Predictive Control Using Online Optimization , 2010, IEEE Transactions on Control Systems Technology.

[27] Cheng-Liang Chen,et al. Multi-objective optimization of multi-echelon supply chain networks with uncertain product demands and prices , 2004, Comput. Chem. Eng..

[28] Tamer Basar,et al. Common Information based Markov Perfect Equilibria for Linear-Gaussian Games with Asymmetric Information , 2014, SIAM J. Control. Optim..

[29] Huanshui Zhang,et al. Infinite horizon linear quadratic optimal control for discrete‐time stochastic systems , 2008 .

[30] Eric Sucky,et al. Production , Manufacturing and Logistics A bargaining model with asymmetric information for a single supplier – single buyer problem , 2005 .

[31] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[32] Alejandro Ribeiro,et al. Online Learning of Feasible Strategies in Unknown Environments , 2016, IEEE Transactions on Automatic Control.

[33] H. Joel Trussell,et al. Optimal color filters in the presence of noise , 1995, IEEE Trans. Image Process..

[34] Giovanni Dell'Ariccia,et al. Asymmetric information and the structure of the banking industry , 2001 .

[35] Wei Li. A Dynamics Perspective of Pursuit-Evasion: Capturing and Escaping When the Pursuer Runs Faster Than the Agile Evader , 2017, IEEE Transactions on Automatic Control.

[36] Huanshui Zhang,et al. Stabilization of networked control systems with both network-induced delay and packet dropout , 2015, Autom..

[37] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[38] Huanshui Zhang,et al. Necessary and sufficient stabilizing conditions for networked control systems with simultaneous transmission delay and packet dropout , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[39] Andrew E. B. Lim,et al. Discrete time LQG controls with control dependent noise , 1999 .