Diffusion Models Meet Contextual Bandits with Large Action Spaces
Abstract
Efficient exploration in contextual bandits is crucial due to their large action space, where uninformed exploration can lead to computational and statistical inefficiencies. However, the rewards of actions are often correlated, which can be leveraged for more efficient exploration. In this work, we use pre-trained diffusion model priors to capture these correlations and develop diffusion Thompson sampling (dTS). We establish both theoretical and algorithmic foundations for dTS. Specifically, we derive efficient posterior approximations (required by dTS) under a diffusion model prior, which are of independent interest beyond bandits and reinforcement learning. We analyze dTS in linear instances and provide a Bayes regret bound highlighting the benefits of using diffusion models as priors. Our experiments validate our theory and demonstrate dTS's favorable performance.
Fichier principal
Diffusion_Models_Meet_Contextual_Bandits___Hal.pdf (2.09 Mo)
Télécharger le fichier
Origin | Files produced by the author(s) |
---|---|
Licence |