Poster
in
Workshop: Robustness in Sequence Modeling
CLIFT : Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain
Ankit Pal
This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question Answering task. The testbed includes 25k high-quality question-answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several deep-learning models under the proposed testbed. Despite impressive results on the original test set with no adaptive overfitting, the performance degrades when applied to new test sets, which leads to a distribution shift. Our findings emphasise the need for and the potential for increasing the robustness of clinical domain models under distributional shift. The testbed offers one way to track progress in that direction. It also highlights the necessity of adopting evaluation metrics that consider robustness to natural distribution shift. The test sets and codes to reproduce the experiments and evaluate new models against Clift are available at anonymous.github.io