On double descent in reinforcement learning with LSTD and random features - Institut Polytechnique de Paris Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

On double descent in reinforcement learning with LSTD and random features

Résumé

Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and l2-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define overparameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Squared Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the l2-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.
Fichier principal
Vignette du fichier
4977_on_double_descent_in_reinforce.pdf (1.33 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04628067 , version 1 (28-06-2024)

Identifiants

  • HAL Id : hal-04628067 , version 1

Citer

David Brellmann, Eloïse Berthier, David Filliat, Goran Frehse. On double descent in reinforcement learning with LSTD and random features. ICLR 2024 - The Twelfth International Conference on Learning Representations, May 2024, Vienna, Austria. ⟨hal-04628067⟩
0 Consultations
0 Téléchargements

Partager

Gmail Mastodon Facebook X LinkedIn More