TFDS תומך כעת בפורמט קרואסון 🥐 ! קרא את התיעוד כדי לדעת יותר.

דף זה תורגם על ידי Cloud Translation API.

rlu_control_suite

תיאור :

RL Unplugged היא חבילת אמות מידה ללימוד חיזוק לא מקוון. ה-RL Unplugged מתוכנן סביב השיקולים הבאים: כדי להקל על השימוש, אנו מספקים למערכי נתונים עם API מאוחד אשר מקל על המתרגל לעבוד עם כל הנתונים בחבילה לאחר הקמת צינור כללי.

מערכי הנתונים פועלים לפי פורמט RLDS כדי לייצג שלבים ופרקים.

DeepMind Control Suite Tassa et al., 2018 היא קבוצה של משימות בקרה המיושמת ב-MuJoCo Todorov et al., 2012 . אנו רואים תת-קבוצה של המשימות הניתנות בחבילה המכסות מגוון רחב של קשיים.

רוב מערכי הנתונים בתחום זה נוצרים באמצעות D4PG. עבור סביבות Manipulator insert ball ו-Manipulation insert peg אנו משתמשים ב-V-MPO Song et al., 2020 כדי ליצור את הנתונים מכיוון ש-D4PG אינו מסוגל לפתור את המשימות הללו. אנו משחררים מערכי נתונים עבור 9 משימות חבילת בקרה. לפרטים על אופן יצירת מערך הנתונים, עיין במאמר.

DeepMind Control Suite הוא רף RL לפעולה רציפה מסורתית. בפרט, אנו ממליצים לך לבדוק את הגישה שלך ב-DeepMind Control Suite אם אתה מעוניין להשוות מול שיטות RL לא מקוונות מתקדמות אחרות.

דף הבית : https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
קוד מקור : tfds.rl_unplugged.rlu_control_suite.RluControlSuite
גרסאות :
- 1.0.0 (ברירת מחדל): שחרור ראשוני.
גודל הורדה : Unknown size
מפתחות בפיקוח (ראה as_supervised doc ): None
איור ( tfds.show_examples ): לא נתמך.
ציטוט :

@inproceedings{gulcehre2020rl,
 title = {RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning},
 author = {Gulcehre, Caglar and Wang, Ziyu and Novikov, Alexander and Paine, Thomas and G'{o}mez, Sergio and Zolna, Konrad and Agarwal, Rishabh and Merel, Josh S and Mankowitz, Daniel J and Paduraru, Cosmin and Dulac-Arnold, Gabriel and Li, Jerry and Norouzi, Mohammad and Hoffman, Matthew and Heess, Nicolas and de Freitas, Nando},
 booktitle = {Advances in Neural Information Processing Systems},
 pages = {7248--7259},
 volume = {33},
 year = {2020}
}

rlu_control_suite/cartpole_swingup (תצורת ברירת המחדל)

גודל מערך נתונים : 2.12 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	40

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(1,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'position': Tensor(shape=(3,), dtype=float32),
            'velocity': Tensor(shape=(2,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(1,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
צעדים/תצפית/עמדה	מוֹתֵחַ	(3,)	לצוף32
צעדים/תצפית/מהירות	מוֹתֵחַ	(2,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/cheetah_run

גודל מערך נתונים : 36.58 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	300

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'position': Tensor(shape=(8,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(6,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
צעדים/תצפית/עמדה	מוֹתֵחַ	(8,)	לצוף32
צעדים/תצפית/מהירות	מוֹתֵחַ	(9,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/finger_turn_hard

גודל מערך נתונים : 47.61 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	500

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(2,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'dist_to_target': Tensor(shape=(1,), dtype=float32),
            'position': Tensor(shape=(4,), dtype=float32),
            'target_position': Tensor(shape=(2,), dtype=float32),
            'velocity': Tensor(shape=(3,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(2,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
צעדים/תצפית/מרחק_למטרה	מוֹתֵחַ	(1,)	לצוף32
צעדים/תצפית/עמדה	מוֹתֵחַ	(4,)	לצוף32
צעדים/תצפית/מיקום_יעד	מוֹתֵחַ	(2,)	לצוף32
צעדים/תצפית/מהירות	מוֹתֵחַ	(3,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/fish_swim

גודל ערכת נתונים: 32.81 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	200

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'joint_angles': Tensor(shape=(7,), dtype=float32),
            'target': Tensor(shape=(3,), dtype=float32),
            'upright': Tensor(shape=(1,), dtype=float32),
            'velocity': Tensor(shape=(13,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(5,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
צעדים/תצפית/זוויות_מפרק	מוֹתֵחַ	(7,)	לצוף32
צעדים/תצפית/יעד	מוֹתֵחַ	(3,)	לצוף32
מדרגות/התבוננות/זקוף	מוֹתֵחַ	(1,)	לצוף32
צעדים/תצפית/מהירות	מוֹתֵחַ	(13,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/humanoid_run

גודל מערך נתונים : 1.21 GiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	3,000

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(21,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'com_velocity': Tensor(shape=(3,), dtype=float32),
            'extremities': Tensor(shape=(12,), dtype=float32),
            'head_height': Tensor(shape=(1,), dtype=float32),
            'joint_angles': Tensor(shape=(21,), dtype=float32),
            'torso_vertical': Tensor(shape=(3,), dtype=float32),
            'velocity': Tensor(shape=(27,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(21,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
צעדים/תצפית/מהירות_תנועה	מוֹתֵחַ	(3,)	לצוף32
צעדים/תצפית/הקצנות	מוֹתֵחַ	(12,)	לצוף32
צעדים/תצפית/גובה_ראש	מוֹתֵחַ	(1,)	לצוף32
צעדים/תצפית/זוויות_מפרק	מוֹתֵחַ	(21,)	לצוף32
צעדים/תצפית/טורסו_אנכי	מוֹתֵחַ	(3,)	לצוף32
צעדים/תצפית/מהירות	מוֹתֵחַ	(27,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/manipulator_insert_ball

גודל ערכת נתונים: 385.41 MiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	1,500

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'arm_pos': Tensor(shape=(16,), dtype=float32),
            'arm_vel': Tensor(shape=(8,), dtype=float32),
            'hand_pos': Tensor(shape=(4,), dtype=float32),
            'object_pos': Tensor(shape=(4,), dtype=float32),
            'object_vel': Tensor(shape=(3,), dtype=float32),
            'target_pos': Tensor(shape=(4,), dtype=float32),
            'touch': Tensor(shape=(5,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(5,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
צעדים/תצפית/זרוע_פוזי	מוֹתֵחַ	(16,)	לצוף32
צעדים/תצפית/זרוע_וול	מוֹתֵחַ	(8,)	לצוף32
צעדים/תצפית/יד_pos	מוֹתֵחַ	(4,)	לצוף32
steps/observation/object_pos	מוֹתֵחַ	(4,)	לצוף32
צעדים/תצפית/אובייקט_וול	מוֹתֵחַ	(3,)	לצוף32
צעדים/תצפית/יעד_פוזי	מוֹתֵחַ	(4,)	לצוף32
צעדים/תצפית/מגע	מוֹתֵחַ	(5,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/manipulator_insert_peg

גודל מערך נתונים : 385.73 MiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	1,500

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'arm_pos': Tensor(shape=(16,), dtype=float32),
            'arm_vel': Tensor(shape=(8,), dtype=float32),
            'hand_pos': Tensor(shape=(4,), dtype=float32),
            'object_pos': Tensor(shape=(4,), dtype=float32),
            'object_vel': Tensor(shape=(3,), dtype=float32),
            'target_pos': Tensor(shape=(4,), dtype=float32),
            'touch': Tensor(shape=(5,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(5,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
צעדים/תצפית/זרוע_פוזי	מוֹתֵחַ	(16,)	לצוף32
צעדים/תצפית/זרוע_וול	מוֹתֵחַ	(8,)	לצוף32
צעדים/תצפית/יד_pos	מוֹתֵחַ	(4,)	לצוף32
steps/observation/object_pos	מוֹתֵחַ	(4,)	לצוף32
צעדים/תצפית/אובייקט_וול	מוֹתֵחַ	(3,)	לצוף32
צעדים/תצפית/יעד_פוזי	מוֹתֵחַ	(4,)	לצוף32
צעדים/תצפית/מגע	מוֹתֵחַ	(5,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/walker_stand

גודל מערך נתונים : 31.78 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	200

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'height': Tensor(shape=(1,), dtype=float32),
            'orientations': Tensor(shape=(14,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(6,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
מדרגות/תצפית/גובה	מוֹתֵחַ	(1,)	לצוף32
צעדים/תצפית/אוריינטציות	מוֹתֵחַ	(14,)	לצוף32
צעדים/תצפית/מהירות	מוֹתֵחַ	(9,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):

rlu_control_suite/walker_walk

גודל מערך נתונים : 31.78 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	200

מבנה תכונה :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'height': Tensor(shape=(1,), dtype=float32),
            'orientations': Tensor(shape=(14,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

תיעוד תכונה :

תכונה	מעמד	צוּרָה	Dtype
	FeaturesDict
פרק_מזהה	מוֹתֵחַ		int64
צעדים	מערך נתונים
צעדים/פעולה	מוֹתֵחַ	(6,)	לצוף32
צעדים/הנחה	מוֹתֵחַ		לצוף32
צעדים/הוא_ראשון	מוֹתֵחַ		bool
צעדים/הוא_אחרון	מוֹתֵחַ		bool
steps/is_terminal	מוֹתֵחַ		bool
צעדים/תצפית	FeaturesDict
מדרגות/תצפית/גובה	מוֹתֵחַ	(1,)	לצוף32
צעדים/תצפית/אוריינטציות	מוֹתֵחַ	(14,)	לצוף32
צעדים/תצפית/מהירות	מוֹתֵחַ	(9,)	לצוף32
צעדים/פרס	מוֹתֵחַ		לצוף32
חותמת זמן	מוֹתֵחַ		int64

דוגמאות ( tfds.as_dataframe ):