Experimental Performance Comparison and Analysis for Various MAB Problems under Cognitive Radio Framework
暂无分享,去创建一个
This presentation gives a brief overview and experimental performance comparison of different types of the online sequential decision making Multiarmed bandit (MAB) problem for the cognitive radio opportunistic spectrum access. In this work, we consider online learning problem of classical, rested and restless MAB for single user/arm and furthermore, it will be extended for the multiple users/arms. A classical MAB problem assumes independent and identically distributed (i.i.d) rewards, while rested and restless formulation of the MAB assumes Markovian rewards. The fundamental objective of the MAB formulation is to maximize the total rewards obtained by playing the best optimal arm. The classical difficulty of the MAB is a fundamental trade-off between exploration and exploitation, which requires an efficient policy design to achieve optimum performance. The short introduction and performance analysis of the various policies (UCB1, UCB Tuned, KL-UCB, etc.) are done by analyzing regret, which is defined as a reward loss compare to optimal performance. For almost all the algorithms, a detailed theoretical analysis of the regret bound is available, while it's important to analyze the experimental performance of the different policies on various MAB formulations. The experimental performance of different MAB algorithms could be easily assessed case by case of specific problems, but it would be interesting to present a more convincing comparison of their actual experimental performance. The main objective of the presentation is to provide an extensive experimental analysis of existing MAB algorithms along different dimensions such as, expected regret, optimal arm selection, and computational complexity. Furthermore, some experimental measurements under dynamic spectrum access framework are carried out for the validation of the theoretical results.