NeurIPS Oral Presentations

Oral Presentations

[ Abstract ]

Abstract:

Understanding Hidden Context in Preference Learning: Consequences for RLHF
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Learning Interactive Real-World Simulators
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
Self-RAG: Self-reflective Retrieval Augmented Generation
Delve into PPO: Implementation Matters for Stable RLHF
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Chat is not available.