Poster
in
Workshop: Machine Learning for Systems
Reward Copilot for RL-driven Systems Optimization
Karan Tandon · Manav Mishra · Gagan Somashekar · Mayukh Das · Nagarajan Natarajan
Systems optimization problems such as workload auto-scaling, kernel parameter tuning, and cluster management arising in large-scale enterprise infrastructure are becoming increasingly RL-driven. While effective, it is difficult to set up the RL framework for such real-world problems --- designing correct and useful reward functions or state spaces is highly challenging and needs a lot of domain expertise. Our proposed novel reward co-pilot solution can help design suitable and interpretable reward functions guided by client-provided specifications for any RL framework. Using experiments on standard benchmarks as well as systems-specific optimization problems, we show that our solution can return reward functions with a certain (informal) feasibility certificate in addition to pareto-optimality.