More research

TD-MPC2:
Scalable, Robust World Models for Continuous Control

ICLR 2024 Spotlight

Nicklas Hansen,  Hao Su*,  Xiaolong Wang*
UC San Diego
*Equal advising

Overview. TD-MPC2 compares favorably to existing model-free and model-based methods across 104 continuous control tasks spanning multiple domains, with a single set of hyperparameters (right). We further demonstrate the scalability of TD-MPC2 by training a single 317M parameter agent to perform 80 tasks across multiple domains, embodiments, and action spaces (left).

Abstract

TD-MPC is a model-based reinforcement learning (MBRL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results without any hyperparameter tuning. We further show that agent capabilities increase with model and data size, and successfully train a single agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents.

TD-MPC2 Learns Diverse Tasks

We evaluate TD-MPC2 on 104 control tasks across 4 task domains: DMControl, Meta-World, ManiSkill2, and MyoSuite.

Benchmarking

TD-MPC2 compares favorably to existing model-free (SAC) and model-based (DreamerV3 and TD-MPC) online RL methods on a diverse set of 104 continuous control tasks. By design, TD-MPC2 achieves consistently strong results without any hyperparameter tuning, which has been instrumental for scaling to large massively multi-task world models.

Massively Multitask World Models

We evaluate the performance of 5 multitask models ranging from 1M to 317M parameters on a collection of 80 diverse tasks that span multiple task domains and vary greatly in objective, embodiment, and action space. The task set consists of all 50 Meta-World tasks, as well as 30 DMControl tasks. We also report scaling results on the DMControl subset. We observe that agent capabilities consistently increase with model size on both task sets.

Supporting Open-Source Science

We open-source a total of 324 TD-MPC2 model checkpoints, including 12 multi-task models (ranging from 1M to 317M parameters) trained on 80, 70, and 30 tasks, respectively.

Additionally, we also release the two 545M and 345M transition datasets that we used to train our multi-task models. The datasets are sourced from the replay buffers of 240 single-task agents and thus contain a wide range of behaviors.

Domains Tasks Embodiments Episodes Transitions Size Link

DMControl + Meta-World
80 12 2.69M 545M 34GB Download

DMControl
30 11 690k 345M 20GB Download

We are excited to see what the community will do with these models and datasets, and hope that our release will encourage other research labs to open-source their checkpoints as well.

Paper

TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, Xiaolong Wang

arXiv preprint

View on arXiv

Citation

If you find our work useful, please consider citing the paper as follows:

@misc{hansen2024tdmpc2, title={TD-MPC2: Scalable, Robust World Models for Continuous Control}, author={Nicklas Hansen and Hao Su and Xiaolong Wang}, booktitle={International Conference on Learning Representations (ICLR)}, year={2024} }