https://hal.archives-ouvertes.fr/hal-03570280v1Al Marjani, AymenAymenAl MarjaniUMPA - UMPA-ENSL - Unité de Mathématiques Pures et Appliquées - ENS Lyon - École normale supérieure - Lyon - CNRS - Centre National de la Recherche ScientifiqueKocak, TomasTomasKocakUniversity of PotsdamGarivier, AurélienAurélienGarivierUMPA - UMPA-ENSL - Unité de Mathématiques Pures et Appliquées - ENS Lyon - École normale supérieure - Lyon - CNRS - Centre National de la Recherche ScientifiqueOn the complexity of All ε-Best Arms IdentificationHAL CCSD2022[MATH] Mathematics [math]Al Marjani, Aymen2022-02-13 11:59:162022-06-23 03:37:152022-02-14 11:21:10enPreprints, Working Papers, ...https://hal.archives-ouvertes.fr/hal-03570280v1application/pdf1We consider the problem introduced by [MJTN20] of identifying all the ε-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. In the fixed confidence setting, we give a lower bound on the number of samples required by any algorithm that returns the set of ε-good arms with a failure probability less than some risk level δ. This bound writes as T * ε (µ) log(1/δ), where T * ε (µ) is a characteristic time that depends on the vector of mean rewards µ and the accuracy parameter ε. We also provide an efficient numerical method to solve the convex max-min program that defines the characteristic time. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [MJTN20]. Using this method, we propose a Track-and-Stop algorithm that identifies the set of ε-good arms w.h.p and enjoys asymptotic optimality (when δ goes to zero) in terms of the expected sample complexity. Finally, using numerical simulations, we demonstrate our algorithm's advantage over state-of-the-art methods, even for moderate values of the risk parameter.