Poster
in
Workshop: Pluralistic Alignment Workshop
Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation
Raeid Saqur
We propose a regime adaptive execution methodology in the financial market domain to tackle the regime switching problem. Dynamic regime switching, or underlying correlation and covariance shifts in true (hidden) market variables, diminishes the robustness of expert/specialist models on downstream tasks like forecasting or market movement prediction from unseen, online data. Our method uses natural, intrinsic market rewards for adaptive RL alignment (RLMF) of expert LLMs; and a teacher-student, repeating dual-phase (train, execute) pipeline that consistently outperforms SOTA trillion parameter models like GPT-4o. Our approach does not rely on the strength of underlying expert models -- any contemporary off-the-shelf foundational LLM model is compatible with our (plug-and-play) algorithm. We use the Llama-2 7B parameter class of base model to show the efficacy of our method that outperforms both generalist and specialist class of expert models and attain strong empirical results including 15\% increase in predictive accuracy on concurrent stock-movement prediction benchmarks