Accurate estimation of predictive uncertainty is essential for
optimal decision making. However, recent works have shown that
current neural networks tend to be miscalibrated, sparking
interest in different approaches to calibration. In this paper,
we conduct a large-scale empirical study of the probabilistic
calibration of neural networks on 57 tabular regression
datasets. We consider recalibration, conformal and
regularization approaches, and investigate the trade-offs they
induce on calibration and sharpness of the predictions. Based on
kernel density estimation, we design new differentiable
recalibration and regularization methods, yielding new insights
into the performance of these approaches. Furthermore, we find
conditions under which recalibration and conformal prediction
are equivalent. Our study is fully reproducible and implemented
in a common code base for fair comparison.
With the growing utilization of machine learning models in real-world applications, improving neural network calibration has become a primary concern. Various methods have been proposed, including post-hoc methods that adjust predictions after training and regularization methods that act during training. In regression, post-hoc methods have shown superior calibration performance compared to regularization methods. However, the base model is trained without any direct connection to the subsequent post-hoc step. To address this limitation, we introduce a novel end-to-end method called Quantile Recalibration Training, integrating post-hoc calibration directly into the training process. We also propose an algorithm unifying our method with other post-hoc and regularization methods. We demonstrate the performance of Quantile Recalibration Training in a large-scale experiment involving 57 tabular regression datasets, showcasing improved predictive accuracy. Additionally, we conduct an ablation study, revealing the significance of different elements in our method. Aiming for reproducibility and fair comparisons, we have implemented our experiments in a common code base.
Sequences of labeled events observed at irregular intervals
in continuous time are ubiquitous across various fields.
Temporal Point Processes (TPPs) provide a mathematical
framework for modeling these sequences, enabling inferences
such as predicting the arrival time of future events and
their associated label, called mark. However, due to model
misspecification or lack of training data, these
probabilistic models may provide a poor approximation of the
true, unknown underlying process, with prediction regions
extracted from them being unreliable estimates of the
underlying uncertainty. This paper develops more reliable
methods for uncertainty quantification in neural TPP models
via the framework of conformal prediction. A primary
objective is to generate a distribution-free joint
prediction region for the arrival time and mark, with a
finite-sample marginal coverage guarantee. A key challenge
is to handle both a strictly positive, continuous response
and a categorical response, without distributional
assumptions. We first consider a simple but overly
conservative approach that combines individual prediction
regions for the event arrival time and mark. Then, we
introduce a more effective method based on bivariate highest
density regions derived from the joint predictive density of
event arrival time and mark. By leveraging the dependencies
between these two variables, this method exclude unlikely
combinations of the two, resulting in sharper prediction
regions while still attaining the pre-specified coverage
level. We also explore the generation of individual
univariate prediction regions for arrival times and marks
through conformal regression and classification techniques.
Moreover, we investigate the stronger notion of conditional
coverage. Finally, through extensive experimentation on both
simulated and real-world datasets, we assess the validity
and efficiency of these methods.