Personalized Optimization with User's Feedback

This paper develops an online algorithm to solve a time-varying optimization problem with an objective that comprises a known time-varying cost and an unknown function. This problem structure arises in a number of engineering systems and cyber-physical systems where the known function captures time-varying engineering costs, and the unknown function models user's satisfaction; in this context, the objective is to strike a balance between given performance metrics and user's satisfaction. Key challenges related to the problem at hand are related to (1) the time variability of the problem, and (2) the fact that learning of the user's utility function is performed concurrently with the execution of the online algorithm. This paper leverages Gaussian processes (GP) to learn the unknown cost function from noisy functional evaluation and build pertinent upper confidence bounds. Using the GP formalism, the paper then advocates time-varying optimization tools to design an online algorithm that exhibits tracking of the oracle-based optimal trajectory within an error ball, while learning the user's satisfaction function with no-regret. The algorithmic steps are inexact, to account for possible limited computational budgets or real-time implementation considerations. Numerical examples are illustrated based on a problem related to vehicle platooning.

[1]  J. Azaïs,et al.  Level Sets and Extrema of Random Processes and Fields , 2009 .

[2]  María M. Seron,et al.  Vehicular platoons in cyclic interconnections , 2018, Autom..

[3]  Andrey Bernstein,et al.  Online Primal-Dual Methods With Measurement Feedback for Time-Varying Convex Optimization , 2018, IEEE Transactions on Signal Processing.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  L2 and L∞ Stability Analysis of Heterogeneous Traffic With Application to Parameter Optimization for the Control of Automated Vehicles , 2018 .

[6]  Georgios B. Giannakis,et al.  Bandit Convex Optimization for Scalable and Dynamic IoT Management , 2017, IEEE Internet of Things Journal.

[7]  Pramod K. Varshney,et al.  A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning: Principals, Recent Advances, and Applications , 2020, IEEE Signal Processing Magazine.

[8]  Joel Huber,et al.  The Effectiveness of Alternative Preference Elicitation Procedures in Predicting Choice , 1993 .

[9]  Alejandro Ribeiro,et al.  Prediction-Correction Interior-Point Method for Time-Varying Convex Optimization , 2016, IEEE Transactions on Automatic Control.

[10]  Andrea Lockerd Thomaz,et al.  Learning from human teachers with Socially Guided Exploration , 2008, 2008 IEEE International Conference on Robotics and Automation.

[11]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[12]  Debdeep Pati,et al.  Frequentist coverage and sup-norm convergence rate in Gaussian process regression , 2017, 1708.04753.

[13]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[14]  Eli Upfal,et al.  Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.

[15]  Robert Shorten,et al.  On L∞ string stability of nonlinear bidirectional asymmetric heterogeneous platoon systems , 2019, Autom..

[16]  Manfred Morari,et al.  Learning and control using gaussian processes: towards bridging machine learning and controls for physical systems , 2018, ICCPS.

[17]  Shahin Shahrampour,et al.  Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.

[18]  Emiliano Dall'Anese,et al.  Prediction-Correction Algorithms for Time-Varying Constrained Optimization , 2017, IEEE Transactions on Signal Processing.

[19]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[20]  Emiliano Dall'Anese,et al.  Optimization and Learning With Information Streams: Time-varying algorithms and applications , 2020, IEEE Signal Processing Magazine.

[21]  Avrim Blum,et al.  Preference Elicitation and Query Learning , 2004, J. Mach. Learn. Res..

[22]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[23]  Behçet Açikmese,et al.  Markov decision processes with sequential sensor measurements , 2019, Autom..

[24]  Alejandro Ribeiro,et al.  A Prediction-Correction Method for Model Predictive Control , 2018, 2018 Annual American Control Conference (ACC).

[25]  Thomas L. Griffiths,et al.  Cognitive Model Priors for Predicting Human Decisions , 2019, ICML.

[26]  Maarten J. IJzerman,et al.  A Systematic Review to Identify the Use of Preference Elicitation Methods in Healthcare Decision Making , 2014, Pharmaceutical Medicine.

[27]  S. Kakade,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012, IEEE Transactions on Information Theory.

[28]  Alex Pentland,et al.  Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[29]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[30]  K. J. Ray Liu,et al.  Online Convex Optimization With Time-Varying Constraints and Bandit Feedback , 2019, IEEE Transactions on Automatic Control.

[31]  Conor Linehan,et al.  Handing over the Keys: A Qualitative Study of the Experience of Automation in Driving , 2019, Int. J. Hum. Comput. Interact..

[32]  Sham M. Kakade,et al.  Information Consistency of Nonparametric Gaussian Process Methods , 2008, IEEE Transactions on Information Theory.

[33]  Ketan Rajawat,et al.  Online Learning With Inexact Proximal Online Gradient Descent Algorithms , 2018, IEEE Transactions on Signal Processing.

[34]  Colin Neil Jones,et al.  A Parametric Nonconvex Decomposition Algorithm for Real-Time and Distributed NMPC , 2016, IEEE Transactions on Automatic Control.

[35]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[36]  Christopher K. I. Williams,et al.  Gaussian regression and optimal finite dimensional linear models , 1997 .

[37]  A. Yokoyama,et al.  Optimization of charging sequence of plug-in electric vehicles in smart grid considering user's satisfaction , 2012, 2012 IEEE International Conference on Power System Technology (POWERCON).

[38]  Mehran Mesbahi,et al.  Online Distributed Convex Optimization on Dynamic Networks , 2014, IEEE Transactions on Automatic Control.

[39]  Panagiotis Patrinos,et al.  Douglas-Rachford Splitting and ADMM for Nonconvex Optimization: Tight Convergence Results , 2017, SIAM J. Optim..

[40]  Jitender Deogun,et al.  A pathway to personalization of integrated treatment: informatics and decision science in psychiatric rehabilitation. , 2011, Schizophrenia bulletin.

[41]  Georgios B. Giannakis,et al.  Time-Varying Convex Optimization: Time-Structured Algorithms and Applications , 2020, Proceedings of the IEEE.

[42]  Yan Zhang,et al.  Socially-Aware Robot Planning via Bandit Human Feedback , 2020, 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS).

[43]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[44]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[45]  Angelia Nedić,et al.  Fast Convergence Rates for Distributed Non-Bayesian Learning , 2015, IEEE Transactions on Automatic Control.

[46]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[47]  Cédric Richard,et al.  Decentralized Online Learning With Kernels , 2017, IEEE Transactions on Signal Processing.

[48]  Emiliano Dall'Anese,et al.  Optimal power flow pursuit , 2016, 2016 American Control Conference (ACC).

[49]  Andreas Krause,et al.  Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[50]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Zoubin Ghahramani,et al.  Collaborative Gaussian Processes for Preference Learning , 2012, NIPS.

[52]  Rossano Schifanella,et al.  The shortest path to happiness: recommending beautiful, quiet, and happy routes in the city , 2014, HT.

[53]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[54]  Sang Min Han,et al.  System Analysis and Optimization of Human-Actuated Dynamical Systems , 2018, 2018 Annual American Control Conference (ACC).

[55]  I. Greenberg The log normal distribution of headways , 1966 .

[56]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[57]  Manfred Morari,et al.  Use of model predictive control and weather forecasts for energy efficient building climate control , 2012 .

[58]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[59]  G. Hug,et al.  Online optimization in closed loop on the power flow manifold , 2017, 2017 IEEE Manchester PowerTech.

[60]  Robert Shorten,et al.  Recovering Markov models from closed-loop data , 2017, Autom..

[61]  Jonathan P. How,et al.  Gaussian Processes for Learning and Control: A Tutorial with Examples , 2018, IEEE Control Systems.

[62]  Ufuk Topcu,et al.  Distributed Charging Control of Electric Vehicles Using Online Learning , 2015, IEEE Transactions on Automatic Control.

[63]  S. Ghosal,et al.  Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[64]  Nuria Oliver,et al.  The Tyranny of Data? The Bright and Dark Sides of Data-Driven Decision-Making for Social Good , 2016, ArXiv.

[65]  D. McFadden,et al.  MIXED MNL MODELS FOR DISCRETE RESPONSE , 2000 .

[66]  Bahman Gharesifard,et al.  Individual Regret Bounds for the Distributed Online Alternating Direction Method of Multipliers , 2019, IEEE Transactions on Automatic Control.

[67]  Lijun Chen,et al.  An Incentive-Based Online Optimization Framework for Distribution Grids , 2017, IEEE Transactions on Automatic Control.

[68]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[69]  Colin Neil Jones,et al.  Data-driven demand response modeling and control of buildings with Gaussian Processes , 2017, 2017 American Control Conference (ACC).

[70]  Van Der Vaart,et al.  Rates of contraction of posterior distributions based on Gaussian process priors , 2008 .

[71]  Jan Peters,et al.  Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[72]  Didier Dacunha-Castelle Jean-Marc Azaïs and Mario Wschebor: Level Sets and Extrema of Random Processes and Fields , 2010, Found. Comput. Math..

[73]  Si Sun,et al.  Towards Precision Stress Management: Design and Evaluation of a Practical Wearable Sensing System for Monitoring Everyday Stress , 2017 .

[74]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[75]  Alexandre d'Aspremont,et al.  Sharpness, Restart and Acceleration , 2017 .

[76]  M J IJzerman,et al.  A Systematic Review To Identify the Use of Preference Elicitation Methods in Health Care Decision Making. , 2014, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[77]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[78]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[79]  Volkan Cevher,et al.  Time-Varying Gaussian Process Bandit Optimization , 2016, AISTATS.