Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning

The age of information metric fails to correctly describe the intrinsic semantics of a status update. In an intelligent reflecting surface-aided cooperative relay communication system, we propose the age of semantics (AoS) for measuring semantics freshness of the status updates. Specifically, we focus on the status updating from a source node (SN) to the destination, which is formulated as a Markov decision process (MDP). The objective of the SN is to maximize the expected satisfaction of AoS and energy consumption under the maximum transmit power constraint. To seek the optimal control policy, we first derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. However, implementing the online DAC in practice poses the key challenge in infinitely repeated interactions between the SN and the system, which can be dangerous particularly during the exploration. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset without any further interactions with the system. Numerical experiments verify the theoretical results and show that our offline DAC scheme significantly outperforms the online DAC scheme and the most representative baselines in terms of mean utility, demonstrating strong robustness to dataset quality.

[1]  Xiaoqi Qin,et al.  Age of Information Optimization in Multi-Channel Based Multi-Hop Wireless Networks , 2023, IEEE Transactions on Mobile Computing.

[2]  Haibo Zeng,et al.  Minimizing AoI With Throughput Requirements in Multi-Path Network Communication , 2022, IEEE/ACM Transactions on Networking.

[3]  Archit Sharma,et al.  A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning , 2022, ICML.

[4]  M. O. Khyam,et al.  Towards Industry 5.0: Intelligent Reflecting Surface (IRS) in Smart Manufacturing , 2022, ArXiv.

[5]  F. Richard Yu,et al.  Task-Oriented Image Transmission for Scene Classification in Unmanned Aerial Systems , 2021, IEEE Transactions on Communications.

[6]  Quanquan Gu,et al.  Learning Stochastic Shortest Path with Linear Function Approximation , 2021, ICML.

[7]  Zhiyuan Xu,et al.  PnP-DRL: A Plug-and-Play Deep Reinforcement Learning Approach for Experience-Driven Networking , 2021, IEEE Journal on Selected Areas in Communications.

[8]  Xianyuan Zhan,et al.  Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning , 2021, AAAI.

[9]  Matthieu Geist,et al.  Offline Reinforcement Learning as Anti-Exploration , 2021, AAAI.

[10]  Changsheng You,et al.  IRS-Aided Wireless Relaying: Deployment Strategy and Capacity Scaling , 2021, IEEE Wireless Communications Letters.

[11]  Yan Zhang,et al.  Digital Twin Networks: A Survey , 2021, IEEE Internet of Things Journal.

[12]  Weiyan Wang,et al.  Enabling Edge-Cloud Video Analytics for Robotics Applications , 2021, IEEE INFOCOM 2021 - IEEE Conference on Computer Communications.

[13]  Nir Levine,et al.  Challenges of real-world reinforcement learning: definitions, benchmarks and analysis , 2021, Machine Learning.

[14]  Walid Saad,et al.  Distributed Reinforcement Learning for Age of Information Minimization in Real-Time IoT Systems , 2021, IEEE Journal of Selected Topics in Signal Processing.

[15]  S. Levine,et al.  Maximum Entropy RL (Provably) Solves Some Robust RL Problems , 2021, ICLR.

[16]  Aakanksha Chowdhery,et al.  Server-Driven Video Streaming for Deep Learning Inference , 2020, SIGCOMM.

[17]  Roy D. Yates,et al.  Age of Information: An Introduction and Survey , 2020, IEEE Journal on Selected Areas in Communications.

[18]  Yusheng Ji,et al.  Information Freshness-Aware Task Offloading in Air-Ground Integrated Edge Computing Systems , 2020, IEEE Journal on Selected Areas in Communications.

[19]  Branka Vucetic,et al.  Optimizing Information Freshness in Two-Hop Status Update Systems Under a Resource Constraint , 2020, IEEE Journal on Selected Areas in Communications.

[20]  Ness B. Shroff,et al.  Minimizing Age of Information in Multi-channel Time-sensitive Information Update Systems , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[21]  Nando de Freitas,et al.  Critic Regularized Regression , 2020, NeurIPS.

[22]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[23]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[24]  D. Yuan,et al.  Optimal Scheduling of Age-Centric Caching: Tractability and Computation , 2020, IEEE Transactions on Mobile Computing.

[25]  Craig Boutilier,et al.  ConQUR: Mitigating Delusional Bias in Deep Q-learning , 2020, ICML.

[26]  Martin A. Riedmiller,et al.  Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.

[27]  Harpreet S. Dhillon,et al.  A Reinforcement Learning Framework for Optimizing Age of Information in RF-Powered Communication Systems , 2019, IEEE Transactions on Communications.

[28]  Xianfu Chen,et al.  Age of Information Aware Radio Resource Management in Vehicular Networks: A Proactive Deep Reinforcement Learning Perspective , 2019, IEEE Transactions on Wireless Communications.

[29]  Anthony Ephremides,et al.  The Age of Incorrect Information: A New Performance Metric for Status Updates , 2019, IEEE/ACM Transactions on Networking.

[30]  Jiayu Zhou,et al.  Ranking Policy Gradient , 2019, ICLR.

[31]  Emil Björnson,et al.  Intelligent Reflecting Surface Versus Decode-and-Forward: How Large Surfaces are Needed to Beat Relaying? , 2019, IEEE Wireless Communications Letters.

[32]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[33]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[34]  Brendan O'Donoghue,et al.  Variational Bayesian Reinforcement Learning with Regret Bounds , 2018, NeurIPS.

[35]  Mehdi Bennis,et al.  Multi-Tenant Cross-Slice Resource Orchestration: A Deep Reinforcement Learning Approach , 2018, IEEE Journal on Selected Areas in Communications.

[36]  Shahab Farazi,et al.  On the Age of Information in Multi-Source Multi-Hop Wireless Status Update Networks , 2018, 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[37]  Mehdi Bennis,et al.  Optimized Computation Offloading Performance in Virtual Edge Computing Systems Via Deep Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[38]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[39]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[40]  Eytan Modiano,et al.  Minimizing age-of-information in multi-hop wireless networks , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[41]  Yin Sun,et al.  Sampling of the Wiener Process for Remote Estimation Over a Channel With Random Delay , 2017, IEEE Transactions on Information Theory.

[42]  Dale Schuurmans,et al.  Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[43]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[44]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[45]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Vincent K. N. Lau,et al.  Delay-Aware Two-Hop Cooperative Relay Communications via Approximate MDP and Stochastic Learning , 2013, IEEE Transactions on Information Theory.

[48]  Robert Schober,et al.  Buffer-Aided Relaying With Adaptive Link Selection—Fixed and Mixed Rate Transmission , 2012, IEEE Transactions on Information Theory.

[49]  Roy D. Yates,et al.  Real-time status: How often should one update? , 2012, 2012 Proceedings IEEE INFOCOM.

[50]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[51]  Markus Fiedler,et al.  A generic quantitative relationship between quality of experience and quality of service , 2010, IEEE Network.

[52]  John M. Cioffi,et al.  Handover in multihop cellular networks , 2009, IEEE Communications Magazine.

[53]  John N. Tsitsiklis,et al.  NP-Hardness of checking the unichain condition in average cost MDPs , 2006, Oper. Res. Lett..

[54]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[55]  Z. Zhang,et al.  Optimizing Information Freshness in RF-Powered Multi-Hop Wireless Networks , 2022, IEEE Transactions on Wireless Communications.

[56]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[57]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[58]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[59]  Sridhar Mahadevan,et al.  Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning , 1996, ICML.

[60]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2022 .