A learning algorithm for the Whittle index policy for scheduling web crawlers

We revisit the Whittle index policy for scheduling web crawlers for ephemeral content proposed in Avrachenkov and Borkar, IEEE Trans. Control of Network Systems 5(1), 2016, and develop a reinforcement learning scheme for it based on LSPE(0). The scheme leverages the known structural properties of the Whittle index policy.

[1]  Vivek S. Borkar,et al.  Low Complexity Online Radio Access Technology Selection Algorithm in LTE-WiFi HetNet , 2020, IEEE Transactions on Mobile Computing.

[2]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  John N. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[6]  Vivek S. Borkar,et al.  Index Policies for Real-Time Multicast Scheduling for Wireless Broadcast Systems , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[7]  Vivek S. Borkar,et al.  Whittle index policy for crawling ephemeral content , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[8]  Vivek S. Borkar,et al.  Structural Properties of Optimal Transmission Policies Over a Randomly Varying Channel , 2008, IEEE Transactions on Automatic Control.

[9]  José Niño-Mora,et al.  Sensor scheduling for hunting elusive hiding targets via whittle's restless bandit index policy , 2011, International Conference on NETwork Games, Control and Optimization (NetGCooP 2011).

[10]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[11]  Dafna Shahaf,et al.  Tractable near-optimal policies for crawling , 2018, Proceedings of the National Academy of Sciences.

[12]  Wei Chu,et al.  Refining Recency Search Results with User Click Feedback , 2011, ArXiv.

[13]  Vivek S. Borkar,et al.  A Structure-aware Online Learning Algorithm for Markov Decision Processes , 2018, VALUETOOLS.

[14]  Urtzi Ayesta,et al.  Stochastic and fluid index policies for resource allocation problems , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[15]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[16]  Vivek S. Borkar,et al.  A reinforcement learning algorithm for restless bandits , 2018, 2018 Indian Control Conference (ICC).

[17]  Dimitri P. Bertsekas,et al.  Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.

[18]  José Niño-Mora,et al.  A Dynamic Page-Refresh Index Policy for Web Crawlers , 2014, ASMTA.

[19]  E. Feron,et al.  Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.

[20]  Liudmila Ostroumova,et al.  Timely crawling of high-quality ephemeral new content , 2013, CIKM.

[21]  Francesco De Pellegrini,et al.  Optimal Trunk-Reservation by Policy Learning , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.