Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Model-based RL

Hao Su

(Contents from IERG5350 taught by Prof. Bolei Zhou.)

Spring, 2024

Agenda

click to jump to the section.

Recall: Learning Objective of RL

Concept of Model-based RL

RL with Environment Model

Model-free RL and Model-based RL

Concepts in Model-based RL

Advantage of Model-based RL

Models of the Environment

What is a Model

Perfect Model Without Learning

Learning the Model

Examples of Model Parameterization

The model of the environment can be parameterized by different functions, for example:

Table Lookup Model

  • Obvious Problems: Only works for small discrete state/action spaces otherwise $N(s, a) = 0$ for many $(s, a)$ pairs.
  • But now given we have some model, what can we do with it?
  • Models-based Value Optimization

    Simplest Way to Utilize Learned Model

    Some Caveats: Inaccurate Models

    Real and Simulated Experience

    How dow we go about constructing our algorithm? Lets think about what our data is like now. We now have two sources of experience:

    Integrating Real and Simulated Experience

    Tabular Dyna-Q

    Dyna-Q: combines direct RL, model learning, and model-based sampling together

    Result of Dyna

    Models-based Policy Optimization

    Policy Optimization with Model-based RL

    Model-based Policy Optimization in RL

    Model-based Policy Optimization in RL

  • Model-based policy optimization in RL is strongly influenced by optimal control theory
  • In optimal control, the algorithm uses the model to compute the optimal control signal to minimize the cost.
  • If the dynamics and reward(cost) is known, we can use optimal control to solve the problem
  • Model Learning for Trajectory Optimization

    Model Learning for Trajectory Optimization

    Model Learning for Trajectory Optimization

    Model Learning with MPC

    Search with Model: Random Shooting

    Search with Model: Cross Entropy Method

    Search with Model: Cross Entropy Method

    Learning Model and Policy Together

    How to Learn Models: Dreamer v3

  • Can play minecraft and pick up diamongs (albeit with a few hacks on the game)
  • Can reasonably predict the future observations too! (Although there are some problems here we will come back to later)
  • How to Learn Models: Using Latent States

    How to Learn Models: Hidden+Latent States, Dreamer v3 Style

    How to Learn Models: Hidden+Latent States, Dreamer v3 Style

    Learning and Planning in just the Latent Space in Model Based RL

    TDMPC

    Key things to remember