Poster
in
Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models
Transfer Learning, Reinforcement Learning for Adaptive Control Optimization under Distribution Shift
Pankaj Rajak · Wojciech Kowalinski · Fei Wang
Keywords: [ transfer learning ] [ Reinforcement Learning ] [ Fraud Prevention ]
Many control systems rely on a pipeline of machine learning models and hand-coded rules to make decisions. However, due to changes in the operating environment, these rules require constant tuning to maintain optimal system performance. Reinforcement learning (RL) can automate the online optimization of rules based on incoming data. However, RL requires extensive training data and exploration, which limits its application to new rules or those with sparse data. Here, we propose a transfer learning approach called Learning from Behavior Prior (LBP) to enable fast, sample-efficient RL optimization by transferring knowledge from an expert controller. We demonstrate this approach by optimizing the rule thresholds in a simulated control pipeline across differing operating conditions. Our method converges 5x faster than vanilla RL, with greater robustness to distribution shift between the expert and target environments. LBP reduces negative impacts during live training, enabling automated optimization even for new controllers.