- Deskripsi :
D4RL adalah benchmark sumber terbuka untuk pembelajaran penguatan offline. Ini menyediakan lingkungan dan kumpulan data standar untuk pelatihan dan algoritma pembandingan.
Kumpulan data mengikuti format RLDS untuk mewakili langkah dan episode.
Dokumentasi Tambahan : Jelajahi di Makalah Dengan Kode
Deskripsi konfigurasi : Lihat detail selengkapnya tentang tugas dan versinya di https://github.com/rail-berkeley/d4rl/wiki/Tasks#gym
Beranda : https://sites.google.com/view/d4rl/home
Kode sumber :
tfds.d4rl.d4rl_mujoco_ant.D4rlMujocoAnt
Versi :
-
1.0.0
: Rilis awal. -
1.1.0
: Menambahkan is_last. -
1.2.0
(default): Diperbarui untuk memperhitungkan pengamatan berikutnya.
-
Kunci yang diawasi (Lihat
as_supervised
doc ):None
Gambar ( tfds.show_examples ): Tidak didukung.
Kutipan :
@misc{fu2020d4rl,
title={D4RL: Datasets for Deep Data-Driven Reinforcement Learning},
author={Justin Fu and Aviral Kumar and Ofir Nachum and George Tucker and Sergey Levine},
year={2020},
eprint={2004.07219},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
d4rl_mujoco_ant/v0-expert (konfigurasi default)
Ukuran unduhan :
131.34 MiB
Ukuran dataset :
464.94 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.288 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v0-medium
Ukuran unduhan :
131.39 MiB
Ukuran dataset :
464.78 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.122 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v0-medium-expert
Ukuran unduhan :
262.73 MiB
Ukuran dataset :
929.71 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.410 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v0-dicampur
Ukuran unduhan :
104.63 MiB
Ukuran dataset :
464.93 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.320 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v0-random
Ukuran unduhan :
139.50 MiB
Ukuran dataset :
464.97 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.377 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v1-expert
Ukuran unduhan :
220.72 MiB
Ukuran dataset :
968.63 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.033 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 111), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(15,), dtype=float32),
'qvel': Tensor(shape=(14,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
aturan | fiturDict | |||
kebijakan/fc0 | fiturDict | |||
kebijakan/fc0/bias | Tensor | (256,) | float32 | |
kebijakan/fc0/bobot | Tensor | (256, 111) | float32 | |
kebijakan/fc1 | fiturDict | |||
kebijakan/fc1/bias | Tensor | (256,) | float32 | |
kebijakan/fc1/berat | Tensor | (256, 256) | float32 | |
kebijakan/last_fc | fiturDict | |||
kebijakan/last_fc/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc/weight | Tensor | (8, 256) | float32 | |
kebijakan/last_fc_log_std | fiturDict | |||
kebijakan/last_fc_log_std/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc_log_std/weight | Tensor | (8, 256) | float32 | |
kebijakan/nonlinier | Tensor | rangkaian | ||
kebijakan/output_distribution | Tensor | rangkaian | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float32 | ||
langkah/info/qpos | Tensor | (15,) | float32 | |
langkah/info/qvel | Tensor | (14,) | float32 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v1-medium
Ukuran unduhan :
222.39 MiB
Ukuran dataset :
1023.71 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.179 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 111), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(15,), dtype=float32),
'qvel': Tensor(shape=(14,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
aturan | fiturDict | |||
kebijakan/fc0 | fiturDict | |||
kebijakan/fc0/bias | Tensor | (256,) | float32 | |
kebijakan/fc0/bobot | Tensor | (256, 111) | float32 | |
kebijakan/fc1 | fiturDict | |||
kebijakan/fc1/bias | Tensor | (256,) | float32 | |
kebijakan/fc1/berat | Tensor | (256, 256) | float32 | |
kebijakan/last_fc | fiturDict | |||
kebijakan/last_fc/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc/weight | Tensor | (8, 256) | float32 | |
kebijakan/last_fc_log_std | fiturDict | |||
kebijakan/last_fc_log_std/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc_log_std/weight | Tensor | (8, 256) | float32 | |
kebijakan/nonlinier | Tensor | rangkaian | ||
kebijakan/output_distribution | Tensor | rangkaian | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float32 | ||
langkah/info/qpos | Tensor | (15,) | float32 | |
langkah/info/qvel | Tensor | (14,) | float32 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v1-medium-expert
Ukuran unduhan :
442.25 MiB
Ukuran dataset :
1.13 GiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.211 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(15,), dtype=float32),
'qvel': Tensor(shape=(14,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float32 | ||
langkah/info/qpos | Tensor | (15,) | float32 | |
langkah/info/qvel | Tensor | (14,) | float32 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v1-medium-replay
Ukuran unduhan :
132.05 MiB
Ukuran dataset :
175.27 MiB
Auto-cached ( dokumentasi ): Hanya ketika
shuffle_files=False
(train)Perpecahan :
Membelah | Contoh |
---|---|
'train' | 485 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float64),
'discount': float64,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float64),
'reward': float64,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float64 | |
langkah/diskon | Tensor | float64 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float64 | |
langkah/hadiah | Tensor | float64 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v1-full-replay
Ukuran unduhan :
437.57 MiB
Ukuran dataset :
580.09 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.319 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float64),
'discount': float64,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float64),
'reward': float64,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float64 | |
langkah/diskon | Tensor | float64 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float64 | |
langkah/hadiah | Tensor | float64 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v1-random
Ukuran unduhan :
225.18 MiB
Ukuran dataset :
583.83 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 5.741 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(15,), dtype=float32),
'qvel': Tensor(shape=(14,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float32 | ||
langkah/info/qpos | Tensor | (15,) | float32 | |
langkah/info/qvel | Tensor | (14,) | float32 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v2-expert
Ukuran unduhan :
355.94 MiB
Ukuran dataset :
969.38 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.035 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 111), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
aturan | fiturDict | |||
kebijakan/fc0 | fiturDict | |||
kebijakan/fc0/bias | Tensor | (256,) | float32 | |
kebijakan/fc0/bobot | Tensor | (256, 111) | float32 | |
kebijakan/fc1 | fiturDict | |||
kebijakan/fc1/bias | Tensor | (256,) | float32 | |
kebijakan/fc1/berat | Tensor | (256, 256) | float32 | |
kebijakan/last_fc | fiturDict | |||
kebijakan/last_fc/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc/weight | Tensor | (8, 256) | float32 | |
kebijakan/last_fc_log_std | fiturDict | |||
kebijakan/last_fc_log_std/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc_log_std/weight | Tensor | (8, 256) | float32 | |
kebijakan/nonlinier | Tensor | rangkaian | ||
kebijakan/output_distribution | Tensor | rangkaian | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v2-full-replay
Ukuran unduhan :
428.57 MiB
Ukuran dataset :
580.09 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.319 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v2-medium
Ukuran unduhan :
358.81 MiB
Ukuran dataset :
1.01 GiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.203 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 111), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(8,), dtype=float32),
'weight': Tensor(shape=(8, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
aturan | fiturDict | |||
kebijakan/fc0 | fiturDict | |||
kebijakan/fc0/bias | Tensor | (256,) | float32 | |
kebijakan/fc0/bobot | Tensor | (256, 111) | float32 | |
kebijakan/fc1 | fiturDict | |||
kebijakan/fc1/bias | Tensor | (256,) | float32 | |
kebijakan/fc1/berat | Tensor | (256, 256) | float32 | |
kebijakan/last_fc | fiturDict | |||
kebijakan/last_fc/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc/weight | Tensor | (8, 256) | float32 | |
kebijakan/last_fc_log_std | fiturDict | |||
kebijakan/last_fc_log_std/bias | Tensor | (8,) | float32 | |
kebijakan/last_fc_log_std/weight | Tensor | (8, 256) | float32 | |
kebijakan/nonlinier | Tensor | rangkaian | ||
kebijakan/output_distribution | Tensor | rangkaian | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v2-medium-expert
Ukuran unduhan :
713.67 MiB
Ukuran dataset :
1.13 GiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.237 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v2-medium-replay
Ukuran unduhan :
130.16 MiB
Ukuran dataset :
175.27 MiB
Auto-cached ( dokumentasi ): Hanya ketika
shuffle_files=False
(train)Perpecahan :
Membelah | Contoh |
---|---|
'train' | 485 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
algoritma | Tensor | rangkaian | ||
pengulangan | Tensor | int32 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_ant/v2-random
Ukuran unduhan :
366.66 MiB
Ukuran dataset :
583.90 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 5.822 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(8,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(15,), dtype=float64),
'qvel': Tensor(shape=(14,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(111,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (8,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/info | fiturDict | |||
langkah/info/action_log_probs | Tensor | float64 | ||
langkah/info/qpos | Tensor | (15,) | float64 | |
langkah/info/qvel | Tensor | (14,) | float64 | |
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | Tensor | (111,) | float32 | |
langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):