NeurIPS Deep and shallow thinking in a single forward pass

Poster
in
Workshop: Workshop on Behavioral Machine Learning

Deep and shallow thinking in a single forward pass

Jennifer Hu · Michael Franke

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Given any input, a language model (LM) performs the same kind of computation to produce an output: a single forward pass through the underlying neural network. Inspired by findings in cognitive psychology, we investigate potential signatures of "deeper" and "shallower" computation within a forward pass, without allowing the model to generate intermediate reasoning steps. We prompt LMs with contrasting statements designed to trigger deeper or shallower reasoning on a set of cognitive reflection tasks. We find suggestive evidence that LMs' preferences for correct (deeper) or intuitive (shallower) answers can be manipulated through prompts related not only to general personality traits, but also situational metabolic, physical, and social factors. We then use the logit lens to investigate how an LM might achieve this behavior. Our results suggest that intuitive answers are preferred in early layers, even when the final behavior is consistent with the correct answer or deeper reasoning. These findings motivate further mechanistic analyses of high-level cognition and reasoning in LMs.

Chat is not available.

Poster in Workshop: Workshop on Behavioral Machine Learning

Deep and shallow thinking in a single forward pass

Jennifer Hu · Michael Franke

Poster
in
Workshop: Workshop on Behavioral Machine Learning