Conformal prediction provides a powerful framework for constructing distribution-free prediction regions with finite-sample coverage guarantees. While extensively studied in univariate settings, its extension to multi-output problems presents additional challenges, including complex output dependencies and high computational costs, and remains relatively underexplored. In this work, we present a unified comparative study of nine conformal methods with different multivariate base models for constructing multivariate prediction regions within the same framework. This study highlights their key properties while also exploring the connections between them. Additionally, we introduce two novel classes of conformity scores for multi-output regression that generalize their univariate counterparts. These scores ensure asymptotic conditional coverage while maintaining exact finite-sample marginal coverage. One class is compatible with any generative model, offering broad applicability, while the other is computationally efficient, leveraging the properties of invertible generative models. Finally, we conduct a comprehensive empirical evaluation across 13 tabular datasets, comparing all the multi-output conformal methods explored in this work. To ensure a fair and consistent comparison, all methods are implemented within a unified code base.
@inproceedings{Dheur2025-br,title={A unified comparative study with generalized conformity scores for multi-output conformal regression},author={Dheur, Victor and Fontana, Matteo and Estievenart, Yorick and Desobry, Naomi and Ben Taieb, Souhaib},booktitle={The 42th International Conference on Machine Learning},year={2025},}
We present a new method for generating confidence sets within the split conformal prediction framework. Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage. The transformation is based on an estimate of the conditional quantile of conformity scores. The resulting method is particularly beneficial for constructing adaptive confidence sets in multi-output problems where standard conformal quantile regression approaches have limited applicability. We develop a theoretical bound that captures the influence of the accuracy of the quantile estimate on the approximate conditional validity, unlike classical bounds for conformal prediction methods that only offer marginal coverage. We experimentally show that our method is highly adaptive to the local data structure and outperforms existing methods in terms of conditional coverage, improving the reliability of statistical inference in various applications.
@inproceedings{Plassier2025-dh,title={Rectifying conformity scores for better conditional coverage},author={Plassier*, Vincent and Fishkov*, Alexander and Dheur*, Victor and Guizani, Mohsen and Ben Taieb, Souhaib and Panov, Maxim and Moulines, Eric},booktitle={The 42th International Conference on Machine Learning},year={2025},note={(<em>* denotes equal contribution.</em>)}}
With the growing utilization of machine learning models in real-world applications, improving neural network calibration has become a primary concern. Various methods have been proposed, including post-hoc methods that adjust predictions after training and regularization methods that act during training. In regression, post-hoc methods have shown superior calibration performance compared to regularization methods. However, the base model is trained without any direct connection to the subsequent post-hoc step. To address this limitation, we introduce a novel end-to-end method called Quantile Recalibration Training, integrating post-hoc calibration directly into the training process. We also propose an algorithm unifying our method with other post-hoc and regularization methods. We demonstrate the performance of Quantile Recalibration Training in a large-scale experiment involving 57 tabular regression datasets, showcasing improved predictive accuracy. Additionally, we conduct an ablation study, revealing the significance of different elements in our method. Aiming for reproducibility and fair comparisons, we have implemented our experiments in a common code base.
@inproceedings{Dheur2024-AISTATS,title={Probabilistic Calibration by Design for Neural Network Regression},booktitle={The 27th International Conference on Artificial Intelligence and Statistics},author={Dheur, Victor and Ben Taieb, Souhaib},year={2024},}