Learning from Biased Data: A Semi-Parametric Approach - Institut Polytechnique de Paris Accéder directement au contenu
Proceedings/Recueil Des Communications Année : 2021

Learning from Biased Data: A Semi-Parametric Approach

Résumé

We consider risk minimization problems where the (source) distribution P S of the training observations Z 1 ,. .. , Z n differs from the (target) distribution P T involved in the risk that one seeks to minimize. Under the natural assumption that P S dominates P T , i.e. P T < < P S , we develop a semiparametric framework in the situation where we do not observe any sample from P T , but rather have access to some auxiliary information at the target population scale. More precisely, assuming that the Radon-Nikodym derivative dP T /dP S (z) belongs to a parametric class {g(z, α), α ∈ A} and that some (generalized) moments of P T are available to the learner, we propose a two-step learning procedure to perform the risk minimization task. We first selectα so as to match the moment constraints as closely as possible and then reweight each (biased) training observation Z i by g(Z i ,α) in the final Empirical Risk Minimization (ERM) algorithm. We establish a O P (1/ √ n) generalization bound proving that, remarkably, the solution to the weighted ERM problem thus constructed achieves a learning rate of the same order as that attained in absence of any sampling bias. Beyond these theoretical guarantees, numerical results providing strong empirical evidence of the relevance of the approach promoted in this article are displayed.
Fichier principal
Vignette du fichier
bertail21a.pdf (1021.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03559370 , version 1 (06-02-2022)

Identifiants

  • HAL Id : hal-03559370 , version 1

Citer

Stéphan Clémençon, Patrice Bertail, Yannick Guyonvarch, Nathan Noiry. Learning from Biased Data: A Semi-Parametric Approach. 2021. ⟨hal-03559370⟩
40 Consultations
34 Téléchargements

Partager

Gmail Facebook X LinkedIn More