Have a question? Connect with the community at the TensorFlow Forum Visit Forum


  • Description:

RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.

DeepMind Lab dataset has several levels from the challenging, partially observable Deepmind Lab suite. DeepMind Lab dataset is collected by training distributed R2D2 by Kapturowski et al., 2018 agents from scratch on individual tasks. We recorded the experience across all actors during entire training runs a few times for every task. The details of the dataset generation process is described in Gulcehre et al., 2021.

We release datasets for five different DeepMind Lab levels: seekavoid_arena_01, explore_rewards_few, explore_rewards_many, rooms_watermaze, rooms_select_nonmatching_object. We also release the snapshot datasets for seekavoid_arena_01 level that we generated the datasets from a trained R2D2 snapshot with different levels of epsilons for the epsilon-greedy algorithm when evaluating the agent in the environment.

DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you are interested in large-scale offline RL models with memory.

    'episode_id': tf.int64,
    'episode_return': tf.float32,
    'steps': Dataset({
        'action': tf.int64,
        'discount': tf.float32,
        'is_first': tf.bool,
        'is_terminal': tf.bool,
        'observation': FeaturesDict({
            'last_action': tf.int64,
            'last_reward': tf.float32,
            'pixels': Image(shape=(72, 96, 3), dtype=tf.uint8),
        'reward': tf.float32,
    title={Regularized Behavior Value Estimation},
    author={ {\c{C} }aglar G{\"{u} }l{\c{c} }ehre and
               Sergio G{\'{o} }mez Colmenarejo and
               Ziyu Wang and
               Jakub Sygnowski and
               Thomas Paine and
               Konrad Zolna and
               Yutian Chen and
               Matthew W. Hoffman and
               Razvan Pascanu and
               Nando de Freitas},
    journal   = {CoRR},
    url       = {https://arxiv.org/abs/2103.09575},

rlu_dmlab_rooms_select_nonmatching_object/training_0 (default config)

  • Dataset size: 1.26 TiB

  • Splits:

Split Examples
'train' 667,349


  • Dataset size: 1.23 TiB

  • Splits:

Split Examples
'train' 666,923


  • Dataset size: 1.24 TiB

  • Splits:

Split Examples
'train' 666,927