Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals

We consider a single-hop wireless network with sources transmitting time-sensitive information to the destination over multiple unreliable channels. Packets from each source are generated according to a stochastic process with known statistics and the state of each wireless channel (ON/OFF) varies according to a stochastic process with unknown statistics. The reliability of the wireless channels is to be learned through observation. At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The probability of a successful transmission to the destination depends on the reliability of the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots. To analyze the performance of the learning algorithm, we introduce the notion of AoI regret, which is the difference between the expected cumulative AoI of the learning algorithm under consideration and the expected cumulative AoI of a genie algorithm that knows the reliability of the channels a priori. The AoI regret captures the penalty incurred by having to learn the statistics of the channels over the $T$ time slots. The results are two-fold: first, we consider learning algorithms that employ well-known solutions to the stochastic multi-armed bandit problem (such as $\epsilon$-Greedy, Upper Confidence Bound, and Thompson Sampling) and show that their AoI regret scales as $\Theta(\log T)$; second, we develop a novel learning algorithm and show that it has $O(1)$ regret. To the best of our knowledge, this is the first learning algorithm with bounded AoI regret.

[1]  Eytan Modiano,et al.  Scheduling Policies for Minimizing Age of Information in Broadcast Wireless Networks , 2018, IEEE/ACM Transactions on Networking.

[2]  Eytan Modiano,et al.  Optimizing Age of Information in Wireless Networks with Throughput Constraints , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[3]  Ness B. Shroff,et al.  The Age of Information in Multihop Networks , 2017, IEEE/ACM Transactions on Networking.

[4]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[7]  Eytan Modiano,et al.  Learning Algorithms for Minimizing Queue Length Regret , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[8]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[9]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[10]  Vangelis Angelakis,et al.  Age of Information: A New Concept, Metric, and Tool , 2018, Found. Trends Netw..

[11]  Yu-Pin Hsu,et al.  Age of Information: Whittle Index for Scheduling Stochastic Arrivals , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[12]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[13]  Thomas Benjamin Stahlbuhk,et al.  Control of wireless networks under uncertain state information , 2018 .

[14]  Eytan Modiano,et al.  Age of Information: A New Metric for Information Freshness , 2019, Age of Information.

[15]  Eytan Modiano,et al.  A Whittle Index Approach to Minimizing Functions of Age of Information , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Marian Codreanu,et al.  On the Age of Information in Status Update Systems With Packet Management , 2015, IEEE Transactions on Information Theory.

[17]  Sharayu Moharir,et al.  Correlated Age-of-Information Bandits , 2020, 2021 IEEE Wireless Communications and Networking Conference (WCNC).

[18]  Sharayu Moharir,et al.  Decentralized Age-of-Information Bandits , 2020, 2021 IEEE Wireless Communications and Networking Conference (WCNC).

[19]  Sharayu Moharir,et al.  Regret of Age-of-Information Bandits , 2020 .

[20]  Zhisheng Niu,et al.  Closed-Form Whittle’s Index-Enabled Random Access for Timely Status Update , 2019, IEEE Transactions on Communications.

[21]  Roy D. Yates,et al.  Age of Information: An Introduction and Survey , 2020, IEEE Journal on Selected Areas in Communications.

[22]  Shipra Agrawal,et al.  Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[23]  Eytan Modiano,et al.  Minimizing the Age of Information in Wireless Networks with Stochastic Arrivals , 2019, IEEE Transactions on Mobile Computing.

[24]  Roy D. Yates,et al.  Status updates through queues , 2012, 2012 46th Annual Conference on Information Sciences and Systems (CISS).

[25]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[26]  Eytan Modiano,et al.  Scheduling Algorithms for Minimizing Age of Information in Wireless Broadcast Networks with Random Arrivals , 2017, IEEE Transactions on Mobile Computing.

[27]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[28]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .