Poster
in
Workshop: Safe Generative AI
An Examination of AI-Generated Text Detectors Across Multiple Domains and Models
Brian Tufts · Xuandong Zhao · Lei Li
The proliferation of large language models has led to increasing concerns about their misuse, especially in cases where AI-generated text is deceptively claimed as human-authored content. Machine-generated content detectors claim to effectively identify such text under various conditions and from any language model. This paper aims to test these claims by evaluating several popular detectors (RADAR, Wild, Fast-DetectGPT, GPTID) on a range of domains and datasets not previously seen by these detectors. Additionally, we test the detectors' performance on new models that were not included in their original evaluations or did not exist at the time. We argue for the importance of the true positive rate at a specific false positive rate (TPR@FPR) metric and demonstrate that these detectors perform poorly in specific settings, with TPR@.01 as low as 0%. Our findings suggest that both trained and zero-shot detectors struggle to maintain a high sensitivity while achieving a reasonable true positive rate.