Offline Reinforcement Learning: A Data-Driven Approach for Sequential Decision Making
Abstract:
Offline Reinforcement Learning (RL) offers a promising approach for real-world applications by learning decision-making policies from existing data without relying on costly or unsafe exploratory interactions. However, offline RL faces challenges such as distributional shifts and safe generalization. In this talk, we present two innovative solutions to address these challenges. The first method, Expectile V-Learning (EVL), enables effective policy learning from non-optimal data by smoothly balancing behavior cloning and optimal value learning. To address distribution shift, EVL provides implicit bootstrap learning within the dataset and avoids value overestimation for actions outside the dataset. The second contribution, Reverse Offline Model-Based Imagination (ROMI), introduces a conservative model-based approach that uses reverse dynamics to generate safe trajectories. Unlike forward imagination, which risks leading to unsafe decisions, ROMI ensures reliable generalization by guiding agents towards known states. Both methods demonstrate state-of-the-art performance on the D4RL benchmark, highlighting their potential for real-world applications like robotics. This talk will explore the key concepts behind EVL and ROMI, their advantages over existing approaches, and their broad applicability.