Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs
Edouard Leurent, Odalric-Ambrym Maillard, Denis Efimov
Oral presentation: Orals & Spotlights Track 31: Reinforcement Learning
on 2020-12-10T06:00:00-08:00 - 2020-12-10T06:15:00-08:00
on 2020-12-10T06:00:00-08:00 - 2020-12-10T06:15:00-08:00
Poster Session 6 (more posters)
on 2020-12-10T09:00:00-08:00 - 2020-12-10T11:00:00-08:00
GatherTown: Reinforcement Learning and Planning ( Town B2 - Spot A1 )
on 2020-12-10T09:00:00-08:00 - 2020-12-10T11:00:00-08:00
GatherTown: Reinforcement Learning and Planning ( Town B2 - Spot A1 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: We consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust). This problem has been studied from different perspectives by different communities. However, the existing theory deals only with the case of quadratic costs (the LQ problem), which limits applications to stabilisation and tracking tasks only. In order to handle more general (non-convex) costs that naturally arise in many practical problems, we carefully select and bring together several tools from different communities, namely non-asymptotic linear regression, recent results in interval prediction, and tree-based planning. Combining and adapting the theoretical guarantees at each layer is non trivial, and we provide the first end-to-end suboptimality analysis for this setting. Interestingly, our analysis naturally adapts to handle many models and combines with a data-driven robust model selection strategy, which enables to relax the modelling assumptions. Last, we strive to preserve tractability at any stage of the method, that we illustrate on two challenging simulated environments.