Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks
Amplified Early Stopping Bias: Overestimated Performance with Deep Learning
Nona Rajabi · Antonio Ribeiro · Miguel Vasco · Danica Kragic
Cross-validation is commonly used to estimate machine learning model performance on new samples. However, using it for both hyperparameter selection and error estimation can lead to overestimating model performance, especially with extensive hyperparameter searches that overly tailor models to validation data. We demonstrate that deep learning further amplifies this bias, with even minor model adjustments causing significant overestimation. Our extensive experiments on simulated and real data focus on the bias from early stopping during cross-validation. We find that overestimation intensifies with network depth and is especially severe in small datasets, which are common in physiological signal processing applications.Selecting the early stopping point during cross-validation can result in ROC-AUC estimates exceeding 90\% on random data, and this effect persists across various sample sizes, architectures, and network sizes.