Customization of J. Bather UCB strategy for a Gaussian multi-armed bandit

  • Сергей Владиславович Гарбарь Yaroslav-the-Wise Novgorod State University
  • Sergey Garbar Yaroslav-the-Wise Novgorod State University
  • Александр Валерианович Колногоров Yaroslav-the-Wise Novgorod State University
  • Alexander Kolnogorov Yaroslav-the-Wise Novgorod State University
Keywords: multi-armed bandit problem, Gaussian multi-armed bandit, minimax approach, UCB rule, invariant description, Monte-Carlo simulations, dynamic programming

Abstract

We consider the customization of the UCB strategy, which was first proposed by J. Bather for Bernoulli two-armed bandit, to the case of a Gaussian multi-armed bandit describing batch data processing. This optimal control problem has classical interpretation as a game with nature, in which the payment function of the player is the expected loss of total income caused by incomplete information. The goal is stated in minimax setting. For the considered game with nature, we present an invariant description of the control with a horizon equal to one, which allows to perform computations in two ways: using Monte-Carlo simulations and analytically by dynamic programming technique. For various configurations of the considered game with nature, we have found saddle points, which characterize the optimal control and the worst-case distribution of the parameters of the multi-armed bandit.

Published
2023-01-18
How to Cite
Гарбарь, С., Garbar, S., Колногоров, А., & Kolnogorov, A. (2023). Customization of J. Bather UCB strategy for a Gaussian multi-armed bandit. Mathematical Game Theory and Applications, 14(2), 3-30. https://doi.org/10.17076/mgta_2022_2_48