Rethinking the Theoretical Foundation of Reinforcement Learning

13394
Equations on a chalkboard, decorative

Rethinking the Theoretical Foundation of Reinforcement Learning

Nan Jiang, Associate Professor of Computer Science at University of Illinois Urbana-Champaign

Given two candidate functions, can we identify which one is the true value function of a large Markov decision process (MDP), given a "benign" dataset? Trivial as it might seem, a version of the question was open for 20+ years in reinforcement learning (RL), and the core difficulties are intimately related to the training instability of modern deep RL. In this talk, I will argue that by rethinking fundamental questions like this, RL theory can provide unique perspectives and solutions to practically relevant problems that are critical to the deployment of RL in real-world scenarios. The first part of the talk concerns holdout validation in offline RL, where the aforementioned question naturally arises. I will show how our algorithm, Batch Value-Function Tournament (BVFT), breaks the theoretical barrier and enjoys promising empirical performances. The second part of the talk is about offline training: when we learn policies from a pre-collected dataset, how to reason about policies that would visit states not seen in the data and avoid over-estimation? I will present the Bellman-consistent pessimism framework, whose extension gives a surprising unification of offline RL and imitation learning.

Nan Jiang is an Associate Professor of Computer Science at University of Illinois at Urbana-Champaign. Prior to joining UIUC, he was a postdoc researcher at Microsoft Research NYC. He received his PhD in Computer Science and Engineering at University of Michigan. His research focuses on the theory of reinforcement learning, with specific interests in the sample complexity of exploration under function approximation, offline RL and evaluation, and learning in partially observable systems. He coauthors a monograph on RL theory and holds editorial positions in the research community, including serving as an action editor for JMLR, an editor for FnT in ML, and Senior Area Chairs for ICML and ICLR. His contributions are recognized by  Best Paper Award in AAMAS 2015, Outstanding Paper Runner-up in ICML 2022, Adobe Data Science Award in 2021, NSF CAREER Award in 2022, Google Research Scholarship in 2024, and Sloan Research Fellowship in 2024.

Host: Ran Chen