NeurIPS Poster BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens

Poster

BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens

Hanxi Guo · Siyuan Cheng · Xiaolong Jin · Zhuo Zhang · Kaiyuan Zhang · Guanhong Tao · Guangyu Shen · Xiangyu Zhang

East Exhibit Hall A-C #4407

[ Abstract ] [ Project Page ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Detecting text generated by Large Language Models (LLMs) is a pressing need in order to identify and prevent misuse of these powerful models in a wide range of applications, which have highly undesirable consequences such as misinformation and academic dishonesty. Given a piece of subject text, many existing detection methods work by measuring the difficulty of LLM predicting the next token in the text from their prefix. In this paper, we make a critical observation that how well the current token’s output logits memorizes the closely preceding input tokens also provides strong evidence. Therefore, we propose a novel bi-directional calculation method that measures the cross-entropy losses between an output logits and the ground-truth token (forward) and between the output logits and the immediately preceding input token (backward). A classifier is trained to make the final prediction based on the statistics of these losses. We evaluate our system, named BISCOPE, on texts generated by five latest commercial LLMs across five heterogeneous datasets, including both natural language and code. BISCOPE demonstrates superior detection accuracy and robustness compared to six existing baseline methods, exceeding the state-of-the-art non-commercial methods’ detection accuracy by over 0.30 F1 score, achieving over 0.95 detection F1 score on average. It also outperforms the best commercial tool GPTZero that is based on a commercial LLM trained with an enormous volume of data. Code is available at https://github.com/MarkGHX/BiScope.

Chat is not available.