https://hal.telecom-paris.fr/hal-03559370Clémençon, StéphanStéphanClémençonIDS - Département Images, Données, Signal - Télécom ParisTechS2A - Signal, Statistique et Apprentissage - LTCI - Laboratoire Traitement et Communication de l'Information - IMT - Institut Mines-Télécom [Paris] - Télécom ParisBertail, PatricePatriceBertailGuyonvarch, YannickYannickGuyonvarchNoiry, NathanNathanNoiryLearning from Biased Data: A Semi-Parametric ApproachHAL CCSD2021[MATH] Mathematics [math][STAT.ML] Statistics [stat]/Machine Learning [stat.ML][MATH.MATH-ST] Mathematics [math]/Statistics [math.ST][MATH.MATH-PR] Mathematics [math]/Probability [math.PR]Clémençon, Stephan2022-02-06 16:20:022022-02-18 03:32:502022-02-17 13:55:54enConference papersapplication/pdf1We consider risk minimization problems where the (source) distribution P S of the training observations Z 1 ,. .. , Z n differs from the (target) distribution P T involved in the risk that one seeks to minimize. Under the natural assumption that P S dominates P T , i.e. P T < < P S , we develop a semiparametric framework in the situation where we do not observe any sample from P T , but rather have access to some auxiliary information at the target population scale. More precisely, assuming that the Radon-Nikodym derivative dP T /dP S (z) belongs to a parametric class {g(z, α), α ∈ A} and that some (generalized) moments of P T are available to the learner, we propose a two-step learning procedure to perform the risk minimization task. We first selectα so as to match the moment constraints as closely as possible and then reweight each (biased) training observation Z i by g(Z i ,α) in the final Empirical Risk Minimization (ERM) algorithm. We establish a O P (1/ √ n) generalization bound proving that, remarkably, the solution to the weighted ERM problem thus constructed achieves a learning rate of the same order as that attained in absence of any sampling bias. Beyond these theoretical guarantees, numerical results providing strong empirical evidence of the relevance of the approach promoted in this article are displayed.