Cooperative Online Learning: Keeping your Neighbors Updated

We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these agents and also to their neighbors in the network. Our results characterize how much knowing the network structure affects the regret as a function of the model of agent activations. When activations are stochastic, the optimal regret (up to constant factors) is shown to be of order $\sqrt{\alpha T}$, where $T$ is the horizon and $\alpha$ is the independence number of the network. We prove that the upper bound is achieved even when agents have no information about the network structure. When activations are adversarial the situation changes dramatically: if agents ignore the network structure, a $\Omega(T)$ lower bound on the regret can be proven, showing that learning is impossible. However, when agents can choose to ignore some of their neighbors based on the knowledge of the network structure, we prove a $O(\sqrt{\overline{\chi} T})$ sublinear regret bound, where $\overline{\chi} \ge \alpha$ is the clique-covering number of the network.

[1]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[2]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[3]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[4]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[5]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[6]  Baruch Awerbuch,et al.  Competitive collaborative learning , 2005, J. Comput. Syst. Sci..

[7]  Chen Yu,et al.  Decentralized Online Learning: Take Benefits from Others’ Data without Sharing Your Own to Track Global Trend , 2019, ACM Trans. Intell. Syst. Technol..

[8]  Ramakrishna R. Nemani,et al.  Spatiotemporal Global Climate Model Tracking , 2017 .

[9]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[10]  Koby Crammer,et al.  A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.

[11]  Claudio Gentile,et al.  Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.

[12]  Jerrold R. Griggs Lower bounds on the independence number in terms of the degrees , 1983, J. Comb. Theory, Ser. B.

[13]  Anit Kumar Sahu,et al.  Dist-Hedge: A partial information setting based distributed non-stochastic sequence prediction algorithm , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[14]  Mehran Mesbahi,et al.  Online distributed optimization via dual averaging , 2013, 52nd IEEE Conference on Decision and Control.

[15]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[16]  Shahin Shahrampour,et al.  Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.

[17]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Multi-armed Bandits , 2018, ArXiv.

[18]  Scott McQuade,et al.  Global Climate Model Tracking Using Geospatial Neighborhoods , 2012, AAAI.