Bayesian Off-Policy Evaluation and Learning for Large Action Spaces - Institut Polytechnique de Paris
Preprints, Working Papers, ... Year : 2024

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Abstract

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach designed for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.
Fichier principal
Vignette du fichier
Bayesian_and_Structured_Off_Policy_Evaluation_and_Learning_for_Large_Action_Spaces___Hal_and_Arxiv (1).pdf (4.97 Mo) Télécharger le fichier
Origin Files produced by the author(s)
Licence

Dates and versions

hal-04606070 , version 1 (09-06-2024)

Licence

Identifiers

  • HAL Id : hal-04606070 , version 1

Cite

Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba. Bayesian Off-Policy Evaluation and Learning for Large Action Spaces. 2024. ⟨hal-04606070⟩
104 View
17 Download

Share

More