Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)
Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning
Hongkang Li · Meng Wang · Songtao Lu · Hui Wan · Xiaodong Cui · Pin-Yu Chen
Transformer-based large language models have displayed impressive capabilities in the domain of in-context learning, wherein they use multiple input-output pairs to make predictions on unlabeled test data. To lay the theoretical groundwork for in-context learning, we delve into optimization and generalization of a single-head, one-layer Transformer in the context of multi-task learning for classification. Our investigation uncovers that lower sample complexity is associated with increased training-relevant features and reduced noise in prompts, resulting in improved learning performance. The trained model exhibits the mechanism to first attend to demonstrations of training-relevant features and then decode the corresponding label embedding. Furthermore, we delineate the conditions necessary for successful out-of-domain generalization for in-context learning, specifically regarding the relationship between training and testing prompts.