Abstract:
- Understanding Hidden Context in Preference Learning: Consequences for RLHF
- Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
- Understanding the Effects of RLHF on LLM Generalisation and Diversity
- Learning Interactive Real-World Simulators
- Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
- Self-RAG: Self-reflective Retrieval Augmented Generation
- Delve into PPO: Implementation Matters for Stable RLHF
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Chat is not available.