P. Alquier and J. Ridgway, Concentration of tempered posteriors and of their variational approximations, Annals of Statistics, vol.48, issue.3, pp.1475-1497, 2020.

P. Alquier and J. Ridgway, Concentration of tempered posteriors and of their variational approximations, Annals of Statistics, vol.48, issue.3, pp.1475-1497, 2020.

N. Aronszajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.68, issue.3, pp.337-337, 1950.

A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh, Clustering with Bregman Divergences, Proceedings of the 2004 SIAM International Conference on Data Mining, vol.6, 2004.

G. Biau, L. Devroye, and G. Lugosi, On the Performance of Clustering in Hilbert Spaces, IEEE Transactions on Information Theory, vol.54, issue.2, pp.781-790, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00290855

C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics, p.387310738, 2006.

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational Inference: A Review for Statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017.

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00751496

C. Brécheteau and C. Levrard, A $k$-points-based distance for robust geometric inference, Bernoulli, vol.26, issue.4, pp.3017-3050, 2020.

J. Cao, Z. Wu, J. Wu, and W. Liu, Towards information-theoretic K-means clustering for image indexing, Signal Processing, vol.93, issue.7, pp.2026-2037, 2013.

O. Catoni, Ecole d'été de probabilités de Saint-Flour XXXI-2001. Collection : Lecture notes in mathematics n°1851, p.272, 2004.

O. Catoni, Challenging the empirical mean and empirical variance: A deviation study, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, vol.48, issue.4, pp.1148-1185, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00517206

O. Catoni, Lecture notes for the IFCAM Summer School on Applied Mathematics, 2014.

O. Catoni and I. Giulini, Preprint repository arXiv achieves milestone million uploads, Physics Today, 2014.

O. Catoni and I. Giulini, Preprint repository arXiv achieves milestone million uploads, Physics Today, 2014.

A. Christmann, I. Steinwart, and A. Van-messem, On consistency and robustness properties of support vector machines for heavy-tailed distributions, Statistics and Its Interface, vol.2, issue.3, pp.311-327, 2009.

. Donald-l-cohn, Measure theory

I. Csiszar, $I$-Divergence Geometry of Probability Distributions and Minimization Problems, The Annals of Probability, vol.3, issue.1, pp.146-158, 1975.

I. Csiszar, Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem, The Annals of Probability, vol.12, issue.3, pp.768-793, 1984.

I. S. Dhillon, S. Mallela, and R. Kumar, Enhanced word clustering for hierarchical text classification, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02, vol.3, p.1265, 2002.

C. Doersch, Tutorial on Variational Autoencoders, 2016.

D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, 2013.

D. Eddelbuettel and R. François, Rcpp: SeamlessRandC++Integration, Journal of Statistical Software, vol.40, issue.8, pp.1-18, 2011.

D. Eddelbuettel and R. François, Using Rcpp in Your Package, Seamless R and C++ Integration with Rcpp, pp.65-74, 2013.

A. Fischer, Quantization and clustering with Bregman divergences, Journal of Multivariate Analysis, vol.101, issue.9, pp.2207-2221, 2010.

I. Giulini, École Normale Supérieure Paris-Saclay, The Grants Register 2019, pp.297-297, 2018.

B. Jiang, J. Pei, Y. Tao, and X. Lin, Clustering Uncertain Data Based on Probability Distribution Similarity, IEEE Transactions on Knowledge and Data Engineering, vol.25, issue.4, pp.751-763, 2013.

C. Levrard, Nonasymptotic bounds for vector quantization in Hilbert spaces, The Annals of Statistics, vol.43, issue.2, pp.592-619, 2015.

C. Levrard, Nonasymptotic bounds for vector quantization in Hilbert spaces, The Annals of Statistics, vol.43, issue.2, pp.592-619, 2015.

T. Li, T. Mei, I. Kweon, and X. Hua, Contextual Bag-of-Words for Visual Categorization, IEEE Transactions on Circuits and Systems for Video Technology, vol.21, issue.4, pp.381-392, 2011.

S. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol.28, issue.2, pp.129-137, 1982.

T. Mainguy, Markov Substitute Processes : a statistical model for linguistics, 2014.
URL : https://hal.archives-ouvertes.fr/tel-01127344

P. Massart and J. Picard, Concentration Inequalities and Model Selection, 2007.

F. Pereira, N. Tishby, and L. Lee, Distributional clustering of English words, Proceedings of the 31st annual meeting on Association for Computational Linguistics -, 1993.

G. Salha, R. Hennequin, V. A. Tran, and M. Vazirgiannis, A Degeneracy Framework for Scalable Graph Autoencoders, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019.

G. Salha, S. Limnios, R. Hennequin, V. Tran, and M. Vazirgiannis, Gravity-Inspired Graph Autoencoders for Directed Link Prediction, Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019.

N. Tishby, F. Pereira, and W. Bialek, The Information Bottleneck Method, Proceedings of the 37th Allerton Conference on Communication, vol.49, 2001.

T. Le and B. Clarke, Using the Bayesian Shtarkov solution for predictions, Computational Statistics & Data Analysis, vol.104, pp.183-196, 2016.

C. Tsai, Bag-of-Words Representation in Image Annotation: A Review, ISRN Artificial Intelligence, vol.2012, pp.1-19, 2012.

J. Wu, The Uniform Effect of K-means Clustering, Advances in K-means Clustering, pp.17-35, 2012.