Preprints, Working Papers, ... Year : 2024

Diffusion Models Meet Contextual Bandits with Large Action Spaces

Abstract

Efficient exploration in contextual bandits is crucial due to their large action space, where uninformed exploration can lead to computational and statistical inefficiencies. However, the rewards of actions are often correlated, which can be leveraged for more efficient exploration. In this work, we use pre-trained diffusion model priors to capture these correlations and develop diffusion Thompson sampling (dTS). We establish both theoretical and algorithmic foundations for dTS. Specifically, we derive efficient posterior approximations (required by dTS) under a diffusion model prior, which are of independent interest beyond bandits and reinforcement learning. We analyze dTS in linear instances and provide a Bayes regret bound highlighting the benefits of using diffusion models as priors. Our experiments validate our theory and demonstrate dTS's favorable performance.
Fichier principal
Vignette du fichier
Diffusion_Models_Meet_Contextual_Bandits___Hal.pdf (2.09 Mo) Télécharger le fichier
Origin Files produced by the author(s)
Licence

Dates and versions

hal-04606078 , version 1 (09-06-2024)

Licence

Identifiers

  • HAL Id : hal-04606078 , version 1

Cite

Imad Aouali. Diffusion Models Meet Contextual Bandits with Large Action Spaces. 2024. ⟨hal-04606078⟩
138 View
21 Download

Share

More