Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Imad Aouali; Victor-Emmanuel Brunel; David Rohde; Anna Korba

Pré-Publication, Document De Travail Année : 2024

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

(1, 2) , (1) , (2) , (1)

1
2

Imad Aouali

Fonction : Auteur

Centre de Recherche en Économie et Statistique

Criteo AI Lab

Victor-Emmanuel Brunel

Fonction : Auteur
PersonId : 932634

Centre de Recherche en Économie et Statistique

David Rohde

Fonction : Auteur

Criteo AI Lab

Anna Korba

Fonction : Auteur

Centre de Recherche en Économie et Statistique

Résumé

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach designed for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

Domaines

Apprentissage [cs.LG] Machine Learning [stat.ML]

Fichier principal

Bayesian_and_Structured_Off_Policy_Evaluation_and_Learning_for_Large_Action_Spaces___Hal_and_Arxiv (1).pdf (4.97 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Paternité

Aouali Imad : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04606070

Soumis le : dimanche 9 juin 2024-14:08:29

Dernière modification le : jeudi 1 août 2024-13:48:06

Dates et versions

hal-04606070 , version 1 (09-06-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04606070 , version 1

Citer

Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba. Bayesian Off-Policy Evaluation and Learning for Large Action Spaces. 2024. ⟨hal-04606070⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X GENES CNRS ENSAE CREST ENSAI X-CREST IP_PARIS

123 Consultations

28 Téléchargements

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager