Spotlight
in
Workshop: Machine Learning for Systems
Scalable RL for Systems via Offline Imitation from Multiple Baselines: A Case Study in Compiler Optimization (Teodor V. Marinov, Google)
Teodor Vanislavov Marinov · Alekh Agarwal · Mircea Trofin
Sun 15 Dec 8:15 a.m. PST — 4:30 p.m. PST
From scheduling, to resource allocation to optimization of complex workflows, systems are replete with decision-making problems which are typically addressed with hand-designed heuristics. Recent literature studies pose these setups as Reinforcement Learning (RL) problems owing to a natural fit, with several successes in simulated benchmark environments. However, bringing the RL approach to any complex system in practice is full of challenges in integrating the system into the act-observe-learn paradigm of RL, which has limited the adoption of these techniques. In this work, we present an alternative approach which uses offline data collected using multiple existing baseline policies to simultaneously improve upon them. By repeating multiple iterations of this improvement process, including any learned policies into the set of baselines, we show how performance can be quickly bootstrapped using our approach. We demonstrate the practicality of our approach through evaluation in optimizing the inlining decisions for the LLVM compiler\cite{}, and obtain significant improvements even over prior RL-based policies.