UCB strategies and optimization of batch processing in a one-armed bandit problem

  • Сергей Владиславович Гарбарь Yaroslav-the-Wise Novgorod State University
  • Sergey Garbar Yaroslav-the-Wise Novgorod State University
  • Александр Валерианович Колногоров Yaroslav-the-Wise Novgorod State University
  • Alexander Kolnogorov Yaroslav-the-Wise Novgorod State University
  • Алексей Николаевич Лазутченко Yaroslav-the-Wise Novgorod State University
  • Alexey Lazutchenko Yaroslav-the-Wise Novgorod State University
Keywords: Gaussian one-armed bandit, UCB rule, invariant description, Monte-Carlo simulations

Abstract

We consider a Gaussian one-armed bandit problem, which arises when optimizing batch data processing if there are two alternative processing methods with a priori known efficiency of the first method. During processing, it is necessary to determine a more effective method and ensure its preferential use. This optimal control problem is interpreted as a game with nature. We investigate cases of known and a priori unknown variance of income corresponding to the second method. The control goal is  considered in a minimax setting, and UCB strategies are used to ensure it. In all the studied cases, invariant descriptions of control on a horizon equal to one are obtained, which depend only on the number of batches into which the data is divided, but not on their full number. These descriptions allow us to determine approximately optimal  parameters of strategies using Monte Carlo simulation. Numerical results show the high efficiency of the proposed UCB strategies.

Published
2024-02-02
How to Cite
Гарбарь, С., Garbar, S., Колногоров, А., Kolnogorov, A., Лазутченко, А., & Lazutchenko, A. (2024). UCB strategies and optimization of batch processing in a one-armed bandit problem. Mathematical Game Theory and Applications, 15(4), 3-27. https://doi.org/10.17076/mgta_2023_4_66