Poster
in
Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models
Sample-Efficient Online Imitation Learning using Pretrained Behavioural Cloning Policies
Joe Watson · Jan Peters
Keywords: [ behavioural cloning ] [ imitation learning ]
Recent advances in robot learning have been enabled by learning rich generative and recurrent policies from expert demonstrations, such as human teleoperation.These policies are capable of solving many complex tasks by accurately modelling human behaviour, which may be multimodal and non-Markovian.However, this imitation learning approach of behavioural cloning (BC) is limited to being offline, which increases the requirement for large expert demonstration datasets and does not enable the policy to learn from its own experience.In this work, we review the recent imitation learning algorithm coherent soft imitation learning (CSIL) and outline how it could be applied to more complex policy architectures.CSIL demonstrates that inverse reinforcement learning can be achieved using only a behaviour cloning policy, which means that its learned reward can be used to further improve a BC policy using additional online interactions. However, CSIL has only been demonstrated using simple feedforward network policies, so we discuss how such an imitation learning algorithm could be applied to more complex policy architectures, such as those including transformers and diffusion models.