R. Agarwal, C. Liang, D. Schuurmans, and M. Norouzi, Learning to generalize from sparse and underspecified rewards, ICML, 2019.

A. Amiranashvili, A. Dosovitskiy, V. Koltun, and T. Brox, Td or not td: Analyzing the role of temporal differencing in deep reinforcement learning, 2018.

A. G. Barto, S. J. Bradtke, and S. P. Singh, Learning to act using realtime dynamic programming, Artif. Intell, vol.72, pp.81-138, 1995.

J. Hans and . Berliner, Some necessary conditions for a master chess program, Proceedings of the 3rd International Joint Conference on Artificial Intelligence, IJCAI'73, pp.77-85, 1973.

P. Dimitri, Bertsekas. Dynamic Programming and Optimal Control, vol.I, 2005.

P. Dimitri, Bertsekas. Dynamic Programming and Optimal Control, vol.II, p.9781886529304, 2007.

D. P. Bertsekas and J. N. Tsitsiklis, of Optimization and neural computation series, Athena Scientific, vol.3, 1996.

D. P. Bertsekas and H. Yu, Stochastic shortest path problems under weak conditions, 2013.

D. Bertsimas and J. Dunn, Optimal classification trees, Mach. Learn, vol.106, issue.7, pp.1039-1082, 2017.

L. Breiman, R. A. Joseph-h-friedman, C. J. Olshen, and . Stone, Classification and regression trees, 1984.

C. Browne, E. J. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling et al., A survey of monte carlo tree search methods, IEEE Trans. Comput. Intellig. and AI in Games, vol.4, issue.1, pp.1-43, 2012.

T. Cazenave, Generalized rapid action value estimation, Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, pp.754-760, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01436522

F. Chollet, , 2015.

M. Thomas, J. A. Cover, and . Thomas, Elements of Information Theory, Series in Telecommunications and Signal Processing, 2006.

T. De-bruin, J. Kober, K. Tuyls, and R. Babu?ka, Integrating state representation learning into deep reinforcement learning, IEEE Robotics and Automation Letters, vol.3, issue.3, pp.1394-1401, 2018.

T. Dean and S. Lin, Decomposition techniques for planning in stochastic domains, Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol.2, pp.1121-1127, 1995.

E. R. Doherty-torstrick, K. E. Walton, and F. Ba, Cyberchondria: Parsing health anxiety from online behavior, Psychosomatics, pp.390-400, 2016.

S. Gelly and D. Silver, Combining online and offline knowledge in uct, Proceedings of the 24th International Conference on Machine Learning, ICML '07, pp.273-280, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00164003

S. Gelly and D. Silver, Monte-carlo tree search and rapid action value estimation in computer go, Artificial Intelligence, vol.175, pp.1856-1875, 2011.

P. E. Hart, N. J. Nilsson, and B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Transactions on Systems Science and Cybernetics, vol.4, issue.2, pp.100-107, 1968.

N. Heess, D. Silver, and Y. Teh, Actor-critic reinforcement learning with energy-based policies, Proceedings of the Tenth European Workshop on Reinforcement Learning, pp.45-58, 2013.

H. Hu, X. Wu, B. Luo, C. Tao, C. Xu et al., Playing 20 question game with policy-based reinforcement learning. CoRR, abs/1808.07645, p.89, 2018.

L. Pack-kaelbling, M. L. Littman, and A. R. Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1, pp.99-134, 1998.

K. Hao-cheng-kao, E. Y. Tang, and . Chang, Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning, AAAI, 2018.

L. Kocsis and C. Szepesvári, Bandit based monte-carlo planning, Proceedings of the 17th European Conference on Machine Learning, ECML'06, pp.282-293, 2006.

S. Köhler, N. A. Vasilevsky, M. Engelstad, E. Foster, J. Mcmurry et al.,

H. J. Connell, L. E. Dawkins, A. Demare, . Devereau, B. A. Bert et al.,

P. N. Haendel and . Robinson, The human phenotype ontology in 2017, Nucleic Acids Research, 2017.

V. Konda, Actor-critic algorithms, NIPS, 1999.

R. E. Korf, Depth-first iterative-deepening: An optimal admissible tree search, Artif. Intell, vol.27, issue.1, pp.97-109, 1985.

X. Kortum, L. Grigull, W. Lechner, and F. Klawonn, A dynamic adaptive questionnaire for improved disease diagnostics, Advances in Intelligent Data Analysis XVI, pp.162-172, 2017.

S. Köhler, M. H. Schulz, P. Krawitz, S. Bauer, S. Dölken et al., Clinical diagnostics in human genetics with semantic similarity searches in ontologies

, The American Journal of Human Genetics, vol.85, issue.4, pp.457-464, 2009.

, Chapter

P. Laroche, GraphMDP: A New Decomposition Tool for Solving Markov Decision Processes, International Journal on Artificial Intelligence Tools, vol.10, issue.3, pp.325-343, 2001.
URL : https://hal.archives-ouvertes.fr/inria-00100821

G. Lozenguez, L. Adouane, and A. Beynier, Map Partitioning to Approximate an Exploration Strategy in Mobile Robotics. Multiagent and Grid Systems, An International Journal of Cloud Computing, vol.8, issue.3, pp.275-288, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00971653

K. Middleton, M. Butt, N. Y. Hammerla, S. Hamblin, K. Mehta et al., Sorting out symptoms: design and evaluation of the 'babylon check' automated triage system, 2016.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou et al., Playing atari with deep reinforcement learning, 2013.

V. Mnih, A. P. Badia, and M. Mirza,

T. Lillicrap, D. Harley, K. Silver, and . Kavukcuoglu, Asynchronous methods for deep reinforcement learning, ICML, 2016.

Y. Andrew, S. J. Ng, and . Russell, Algorithms for inverse reinforcement learning, Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pp.663-670, 2000.

R. Parr, Flexible decomposition algorithms for weakly coupled markov decision problems, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, UAI'98, pp.422-430, 1998.

Y. Peng, K. Tang, H. Lin, and E. Chang, Refuel: Exploring sparse features in deep reinforcement learning for fast disease diagnosis

I. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-bianchi et al., Advances in Neural Information Processing Systems 31, pp.7333-7342, 2018.

J. R. Quinlan, Induction of decision trees, Mach. Learn, vol.1, issue.1, pp.81-106, 1986.

S. Razzaki, A. Baker, Y. Perov, K. Middleton, J. Baxter et al., A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis. CoRR, abs/1806.10698, 2018.

. Hl-semigran, J. A. Linder, and C. Gidengil, Evaluation of symptom checkers for self diagnosis and triage: audit study. page 351, British Medical Journal, 2015.

S. Srinivasan, E. Talvitie, and M. H. Bowling, Improving exploration in uct using local manifolds, AAAI, 2015.

R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 2018.

R. S. Sutton, D. A. Mcallester, P. Satinder, Y. Singh, and . Mansour, Policy gradient methods for reinforcement learning with function approximation, NIPS, 1999.

C. Szepesvari, Algorithms for Reinforcement Learning. Morgan and Claypool Publishers, p.9781608454921, 2010.

K. Tang, H. Kao, C. Chou, and E. Y. Chang, Inquire and diagnose : Neural symptom checking ensemble using deep reinforcement learning, 2016.

M. C. Tsakiris and D. C. Tarraf, On subspace decompositions of finite horizon dynamic programming problems, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp.1890-1895, 2012.

C. Manolis, D. C. Tsakiris, and . Tarraf, Algebraic Decompositions of DP Problems with Linear Dynamics. arXiv e-prints, art, 2014.

E. Wiewiora, Potential-based shaping and q-value initialization are equivalent. CoRR, abs/1106, vol.5267, 2011.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, vol.8, pp.229-256, 1992.

C. Xiao, J. Mei, and M. Müller, Memory-augmented monte carlo tree search, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence

M. Adamcik, Collective reasoning under uncertainty and inconsistency. Doctoral thesis, 2014.

J. Barthelemy and P. L. Toint, Synthetic population generation without a sample, Transportation Science, vol.47, pp.266-279, 2013.

A. L. Berger, V. J. Della-pietra, and S. A. Della-pietra, A maximum entropy approach to natural language processing, Comput. Linguist, vol.22, issue.1, pp.39-71, 1996.

M. M. Yvonne, S. E. Bishop, P. W. Fienberg, R. J. Holl, F. Light et al., Discrete multivariate analysis: Theory and practice, 1975.

E. Charniak, The bayesian basis of common sense medical diagnosis, AAAI, 1983.

A. Costa-constantinou, N. Fenton, and M. Neil, Integrating expert knowledge with data in bayesian networks, Expert Syst. Appl, vol.56, pp.197-208, 2016.

M. Thomas, J. A. Cover, and . Thomas, Elements of Information Theory, Series in Telecommunications and Signal Processing, 2006.

I. Csiszár and F. Matús, Information projections revisited, IEEE Trans. Information Theory, vol.49, issue.6, pp.1474-1490, 2003.

. Pierre-simon-de-laplace, Mémoire sur la probabilité des causes par les évènements. Mémoires de mathématique et de physique présentés à l'Académie royale des sciences par divers sçavans et lus dans les assemblées, p.1774

W. , E. Deming, and F. F. Stephan, On a least squares adjustment of a sampled frequency table when the expected marginal totals are known, Ann. Math. Statist, vol.11, issue.4, p.1940

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, 2004.

M. Ghavamzadeh, S. Mannor, J. Pineau, and A. Tamar, Bayesian reinforcement learning: A survey. CoRR, abs/1609.04436, 2016.

, Learning a model of the environment

D. Heckerman, D. Geiger, and D. M. Chickering, Learning bayesian networks: The combination of knowledge and statistical data, Machine Learning, vol.20, issue.3, pp.197-243, 1995.

M. Herman, T. Gindele, J. Wagner, F. Schmitt, and W. Burgard, Inverse reinforcement learning with simultaneous estimation of rewards and dynamics

A. Gretton and C. C. Robert, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, vol.51, pp.9-11, 2016.

D. Hunter, Uncertain reasoning using maximum entropy inference, Proceedings of the First Conference on Uncertainty in Artificial Intelligence, UAI'85, pp.21-27, 1985.

C. Ireland and S. Kullback, Contingency tables with given marginals, Biometrika, vol.55, pp.179-88, 1968.

E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev, vol.106, issue.4, pp.620-630, 1957.

R. Jirousek, A survey of methods used in probabilistic expert systems for knowledge integration, Knowl.-Based Syst, vol.3, issue.1, pp.7-12, 1990.

D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques -Adaptive Computation and Machine Learning, p.9780262013192, 2009.

J. Mardia, J. Jiao, E. Tánczos, R. D. Nowak, and T. Weissman, Concentration inequalities for the empirical distribution, 2018.

J. W. Miller and R. M. Goodman, A polynomial time algorithm for finding bayesian probabilities from marginal constraints. CoRR, abs/1304.1104, 2013.

F. Nielsen, Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms, IEEE Signal Processing Letters, vol.20, pp.657-660, 2013.

F. Nielsen and R. Nock, Sided and symmetrized bregman centroids, IEEE Transactions on Information Theory, vol.55, p.123, 2009.

J. Pearl, Probabilistic reasoning in intelligent systems -networks of plausible inference, Morgan Kaufmann series in representation and reasoning, 1989.

J. E. Shore, Relative entropy, probabilistic inference and AI. CoRR, abs/1304, vol.3423, 2013.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of go with deep neural networks and tree search, Nature, vol.529, pp.484-489, 2016.

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang et al.,

D. Graepel and . Hassabis, Mastering the game of go without human knowledge, Nature, vol.550, pp.354-359, 2017.

D. J. Spiegelhalter, A. P. Dawid, S. L. Lauritzen, and R. G. Cowell, Bayesian analysis in expert systems, Statist. Sci, vol.8, issue.3, pp.219-247, 1993.

A. C. Tossou and C. Dimitrakakis, Probabilistic inverse reinforcement learning in unknown environments, 2013.

H. Uzawa, Iterative methods for concave programming, Studies in Linear and Nonlinear Programming, pp.154-165, 1958.

. R-nj and . Veldhuis, The centroid of the symmetrical kullback-leibler distance, IEEE Signal Processing Letters, vol.9, pp.96-99, 2002.

Y. Zhou, N. Fenton, and C. Zhu, An empirical study of bayesian network parameter learning with monotonic influence constraints, Decision Support Systems, vol.87, pp.69-79, 2016.

C. G. Paulo, K. B. Costa, and . Laskey, Pr-owl: A framework for probabilistic ontologies, Proceedings of the 2006 Conference on Formal Ontology in Information Systems: Proceedings of the Fourth International Conference (FOIS 2006), pp.237-249, 2006.

M. Thomas, J. A. Cover, and . Thomas, Elements of Information Theory, Series in Telecommunications and Signal Processing, 2006.

P. Sidoine and V. Donfack-guefack, Representation of the signs in the biomedical ontologies for the help to the diagnosis. Theses, Université Rennes 1, 2013.

S. Köhler, M. H. Schulz, P. Krawitz, S. Bauer, S. Dölken et al., Clinical diagnostics in human genetics with semantic similarity searches in ontologies

, The American Journal of Human Genetics, vol.85, issue.4, pp.457-464, 2009.

P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol.1, pp.448-453, 1995.

B. D. Solomon, Vacterl/vater association, Orphanet J Rare Dis, 2011.

A. Burgun-valérie-bertaud-gounot and R. Duvauferrier, Ontology and medical diagnosis, Inform Health Soc Care, vol.4, 2012.