Stochastic Optimization for Performative Prediction

In performative prediction, the choice of a model influences the distribution of future data, typically through actions taken based on the model's predictions. We initiate the study of stochastic optimization for performative prediction. What sets this setting apart from traditional stochastic optimization is the difference between merely updating model parameters and deploying the new model. The latter triggers a shift in the distribution that affects future data, while the former keeps the distribution as is. Assuming smoothness and strong convexity, we prove non-asymptotic rates of convergence for both greedily deploying models after each stochastic update (greedy deploy) as well as for taking several updates before redeploying (lazy deploy). In both cases, our bounds smoothly recover the optimal $O(1/k)$ rate as the strength of performativity decreases. Furthermore, they illustrate how depending on the strength of performative effects, there exists a regime where either approach outperforms the other. We experimentally explore this trade-off on both synthetic data and a strategic classification simulator.

[1]  H. Robbins A Stochastic Approximation Method , 1951 .

[2]  Anca D. Dragan,et al.  The Social Cost of Strategic Classification , 2018, FAT.

[3]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[4]  Hiroshi Kajino,et al.  Cogra: Concept-Drift-Aware Stochastic Gradient Descent for Time-Series Forecasting , 2019, AAAI.

[5]  Jared Nambwenya,et al.  Give Me Some Credit , 2014 .

[6]  Celestine Mendler-Dünner,et al.  Performative Prediction , 2020, ICML.

[7]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[8]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[9]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[10]  Philip M. Long,et al.  On the complexity of learning from drifting distributions , 1997, COLT '96.

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  Peter L. Bartlett,et al.  Learning with a slowly changing distribution , 1992, COLT '92.

[13]  Suresh Venkatasubramanian,et al.  Runaway Feedback Loops in Predictive Policing , 2017, FAT.

[14]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[15]  Ronald L. Rivest,et al.  Learning Time-Varying Concepts , 1990, NIPS.

[16]  Sham M. Kakade,et al.  Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.

[17]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[18]  Shai Ben-David,et al.  Learning Changing Concepts by Exploiting the Structure of Change , 1996, COLT '96.

[19]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[20]  Sanmay Das,et al.  Allocating Interventions Based on Predicted Outcomes: A Case Study on Homelessness Services , 2019, AAAI.

[21]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.