Oral Poster
Brain Treebank: Large-scale intracranial recordings from naturalistic language stimuli
Christopher Wang · Adam Yaari · Aaditya Singh · Vighnesh Subramaniam · Dana Rosenfarb · Jan DeWitt · Pranav Misra · Joseph Madsen · Scellig Stone · Gabriel Kreiman · Boris Katz · Ignacio Cases · Andrei Barbu
Fri 13 Dec 3:30 p.m. PST — 4:30 p.m. PST
We present the Brain Treebank, a large-scale dataset of electrophysiological neural responses, recorded from intracranial probes while 10 subjects watched one or more Hollywood movies. Subjects watched on average 2.6 Hollywood movies, for an average viewing time of 4.3 hours, and a total of 43 hours. The audio track for each movie was transcribed with manual corrections. Word onsets were manually annotated on spectrograms of the audio track for each movie. Each transcript was automatically parsed and manually corrected into the universal dependencies (UD) formalism, assigning a part of speech to every word and a dependency parse to every sentence. In total, subjects heard 36,000 sentences (205,000 words), while they had on average 167 electrodes implanted. This is the largest dataset of intracranial recordings featuring grounded naturalistic language, one of the largest English UD treebanks in general, and one of only a few UD treebanks aligned to multimodal features. We hope that this dataset serves as a bridge between linguistic concepts, perception, and their neural representations. To that end, we present an analysis of which electrodes are sensitive to language features while also mapping out a rough time course of language processing across these electrodes. The Brain Treebank is available at https://BrainTreebank.dev/
Live content is unavailable. Log in and register to view live content