- Description:
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.
The datasets follow the RLDS format to represent steps and episodes.
DeepMind Lab dataset has several levels from the challenging, partially observable Deepmind Lab suite. DeepMind Lab dataset is collected by training distributed R2D2 by Kapturowski et al., 2018 agents from scratch on individual tasks. We recorded the experience across all actors during entire training runs a few times for every task. The details of the dataset generation process is described in Gulcehre et al., 2021.
We release datasets for five different DeepMind Lab levels:
seekavoid_arena_01
, explore_rewards_few
, explore_rewards_many
,
rooms_watermaze
, rooms_select_nonmatching_object
. We also release the
snapshot datasets for seekavoid_arena_01
level that we generated the datasets
from a trained R2D2 snapshot with different levels of epsilons for the
epsilon-greedy algorithm when evaluating the agent in the environment.
DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you are interested in large-scale offline RL models with memory.
Homepage: https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
Source code:
tfds.rl_unplugged.rlu_dmlab_rooms_select_nonmatching_object.RluDmlabRoomsSelectNonmatchingObject
Versions:
1.0.0
: Initial release.1.1.0
: Added is_last.1.2.0
(default): BGR -> RGB fix for pixel observations.
Download size:
Unknown size
Auto-cached (documentation): No
Feature structure:
FeaturesDict({
'episode_id': int64,
'episode_return': float32,
'steps': Dataset({
'action': int64,
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'last_action': int64,
'last_reward': float32,
'pixels': Image(shape=(72, 96, 3), dtype=uint8),
}),
'reward': float32,
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
episode_id | Tensor | int64 | ||
episode_return | Tensor | float32 | ||
steps | Dataset | |||
steps/action | Tensor | int64 | ||
steps/discount | Tensor | float32 | ||
steps/is_first | Tensor | bool | ||
steps/is_last | Tensor | bool | ||
steps/is_terminal | Tensor | bool | ||
steps/observation | FeaturesDict | |||
steps/observation/last_action | Tensor | int64 | ||
steps/observation/last_reward | Tensor | float32 | ||
steps/observation/pixels | Image | (72, 96, 3) | uint8 | |
steps/reward | Tensor | float32 |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@article{gulcehre2021rbve,
title={Regularized Behavior Value Estimation},
author={ {\c{C} }aglar G{\"{u} }l{\c{c} }ehre and
Sergio G{\'{o} }mez Colmenarejo and
Ziyu Wang and
Jakub Sygnowski and
Thomas Paine and
Konrad Zolna and
Yutian Chen and
Matthew W. Hoffman and
Razvan Pascanu and
Nando de Freitas},
year={2021},
journal = {CoRR},
url = {https://arxiv.org/abs/2103.09575},
eprint={2103.09575},
archivePrefix={arXiv},
}
rlu_dmlab_rooms_select_nonmatching_object/training_0 (default config)
Dataset size:
1.11 TiB
Splits:
Split | Examples |
---|---|
'train' |
667,349 |
- Examples (tfds.as_dataframe):
rlu_dmlab_rooms_select_nonmatching_object/training_1
Dataset size:
1.08 TiB
Splits:
Split | Examples |
---|---|
'train' |
666,923 |
- Examples (tfds.as_dataframe):
rlu_dmlab_rooms_select_nonmatching_object/training_2
Dataset size:
1.09 TiB
Splits:
Split | Examples |
---|---|
'train' |
666,927 |
- Examples (tfds.as_dataframe):