Counterfactual Vision-and-Language Navigation: Unravelling the Unseen
Amin Parvaneh, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Anton van den Hengel
Spotlight presentation: Orals & Spotlights Track 22: Vision Applications
on 2020-12-09T20:10:00-08:00 - 2020-12-09T20:20:00-08:00
on 2020-12-09T20:10:00-08:00 - 2020-12-09T20:20:00-08:00
Poster Session 5 (more posters)
on 2020-12-09T21:00:00-08:00 - 2020-12-09T23:00:00-08:00
GatherTown: Application ( Town D0 - Spot C1 )
on 2020-12-09T21:00:00-08:00 - 2020-12-09T23:00:00-08:00
GatherTown: Application ( Town D0 - Spot C1 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: The task of vision-and-language navigation (VLN) requires an agent to follow text instructions to find its way through simulated household environments. A prominent challenge is to train an agent capable of generalising to new environments at test time, rather than one that simply memorises trajectories and visual details observed during training. We propose a new learning strategy that learns both from observations and generated counterfactual environments. We describe an effective algorithm to generate counterfactual observations on the fly for VLN, as linear combinations of existing environments. Simultaneously, we encourage the agent's actions to remain stable between original and counterfactual environments through our novel training objective-effectively removing the spurious features that otherwise bias the agent. Our experiments show that this technique provides significant improvements in generalisation on benchmarks for Room-to-Room navigation and Embodied Question Answering.