On double descent in reinforcement learning with LSTD and random features

David Brellmann; Eloïse Berthier; David Filliat; Goran Frehse

Communication Dans Un Congrès Année : 2024

On double descent in reinforcement learning with LSTD and random features

(1) , (1) , (1, 2) , (1)

1
2

David Brellmann

Fonction : Auteur
PersonId : 1318519

Unité d'Informatique et d'Ingénierie des Systèmes

Eloïse Berthier

Fonction : Auteur
PersonId : 1256054
IdHAL : eloise-berthier

Unité d'Informatique et d'Ingénierie des Systèmes

David Filliat

Fonction : Auteur
PersonId : 45
IdHAL : david-filliat
ORCID : 0000-0002-5739-1618
IdRef : 070072337

Unité d'Informatique et d'Ingénierie des Systèmes

Flowing Epigenetic Robots and Systems

Goran Frehse

Fonction : Auteur
PersonId : 181660
IdHAL : goran-frehse
ORCID : 0000-0002-5441-0481

Unité d'Informatique et d'Ingénierie des Systèmes

Résumé

Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and l2-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define overparameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Squared Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the l2-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

Mots clés

Reinforcement learning Double descent

Domaines

Informatique [cs]

Fichier principal

4977_on_double_descent_in_reinforce.pdf (1.33 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

David Filliat : Connectez-vous pour contacter le contributeur

https://hal.ip-paris.fr/hal-04628067

Soumis le : vendredi 28 juin 2024-09:27:41

Dernière modification le : mercredi 13 novembre 2024-10:26:03

Dates et versions

hal-04628067 , version 1 (28-06-2024)

Identifiants

HAL Id : hal-04628067 , version 1

Citer

David Brellmann, Eloïse Berthier, David Filliat, Goran Frehse. On double descent in reinforcement learning with LSTD and random features. ICLR 2024 - The Twelfth International Conference on Learning Representations, May 2024, Vienna, Austria. ⟨hal-04628067⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENSTA INRIA ENSTA_U2IS INRIA2 IP_PARIS

82 Consultations

37 Téléchargements

On double descent in reinforcement learning with LSTD and random features

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager