TFDS hiện hỗ trợ định dạng Croissant 🥐 ! Đọc tài liệu để biết thêm.

Trang này được dịch bởi Cloud Translation API.

d4rl_adroit_hammer

Sự miêu tả :

D4RL là một chuẩn mực nguồn mở dành cho việc học tăng cường ngoại tuyến. Nó cung cấp các môi trường và bộ dữ liệu được tiêu chuẩn hóa cho các thuật toán đào tạo và đo điểm chuẩn.

Các bộ dữ liệu tuân theo định dạng RLDS để thể hiện các bước và các tập.

Mô tả cấu hình : Xem thêm chi tiết về tác vụ và các phiên bản của nó trong https://github.com/rail-berkeley/d4rl/wiki/Tasks#adroit
Trang chủ : https://sites.google.com/view/d4rl-anonymous
Mã nguồn : tfds.d4rl.d4rl_adroit_hammer.D4rlAdroitHammer
Phiên bản :
- 1.0.0 : Bản phát hành đầu tiên.
- 1.1.0 (mặc định): Đã thêm is_last.
Khóa được giám sát (Xem as_supervised doc ): None
Hình ( tfds.show_examples ): Không được hỗ trợ.
Trích dẫn :

@misc{fu2020d4rl,
    title={D4RL: Datasets for Deep Data-Driven Reinforcement Learning},
    author={Justin Fu and Aviral Kumar and Ofir Nachum and George Tucker and Sergey Levine},
    year={2020},
    eprint={2004.07219},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

d4rl_adroit_hammer/v0-human (cấu hình mặc định)

Kích thước tải xuống : 5.33 MiB
Kích thước tập dữ liệu : 6.10 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Có
Chia tách :

Tách ra	Ví dụ
`'train'`	70

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(26,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'qpos': Tensor(shape=(33,), dtype=float32),
            'qvel': Tensor(shape=(33,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(46,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(26,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/qpos	Tenxơ	(33,)	phao32
bước/thông tin/qvel	Tenxơ	(33,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(46,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_adroit_hammer/v0-nhân bản

Kích thước tải xuống : 644.69 MiB
Kích thước tập dữ liệu : 538.97 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	5,594

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(26,), dtype=float32),
        'discount': float64,
        'infos': FeaturesDict({
            'qpos': Tensor(shape=(33,), dtype=float64),
            'qvel': Tensor(shape=(33,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(46,), dtype=float64),
        'reward': float64,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(26,)	phao32
bước/giảm giá	Tenxơ		phao64
các bước/thông tin	Tính năngDict
bước/thông tin/qpos	Tenxơ	(33,)	phao64
bước/thông tin/qvel	Tenxơ	(33,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(46,)	phao64
bước/phần thưởng	Tenxơ		phao64

Ví dụ ( tfds.as_dataframe ):

d4rl_adroit_hammer/v0-expert

Kích thước tải xuống : 529.91 MiB
Kích thước tập dữ liệu : 737.00 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	5.000

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(26,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_logstd': Tensor(shape=(26,), dtype=float32),
            'action_mean': Tensor(shape=(26,), dtype=float32),
            'qpos': Tensor(shape=(33,), dtype=float32),
            'qvel': Tensor(shape=(33,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(46,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(26,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/hành động_logstd	Tenxơ	(26,)	phao32
bước/thông tin/hành động_mean	Tenxơ	(26,)	phao32
bước/thông tin/qpos	Tenxơ	(33,)	phao32
bước/thông tin/qvel	Tenxơ	(33,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(46,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_adroit_hammer/v1-human

Kích thước tải xuống : 5.35 MiB
Kích thước tập dữ liệu : 6.34 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Có
Chia tách :

Tách ra	Ví dụ
`'train'`	25

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(26,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'board_pos': Tensor(shape=(3,), dtype=float32),
            'qpos': Tensor(shape=(33,), dtype=float32),
            'qvel': Tensor(shape=(33,), dtype=float32),
            'target_pos': Tensor(shape=(3,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(46,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(26,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/board_pos	Tenxơ	(3,)	phao32
bước/thông tin/qpos	Tenxơ	(33,)	phao32
bước/thông tin/qvel	Tenxơ	(33,)	phao32
bước/thông tin/target_pos	Tenxơ	(3,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(46,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_adroit_hammer/v1-nhân bản

Kích thước tải xuống : 425.93 MiB
Kích thước tập dữ liệu : 1.68 GiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	3,606

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'policy': FeaturesDict({
        'fc0': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(46, 256), dtype=float32),
        }),
        'fc1': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 256), dtype=float32),
        }),
        'last_fc': FeaturesDict({
            'bias': Tensor(shape=(26,), dtype=float32),
            'weight': Tensor(shape=(256, 26), dtype=float32),
        }),
        'nonlinearity': string,
        'output_distribution': string,
    }),
    'steps': Dataset({
        'action': Tensor(shape=(26,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'board_pos': Tensor(shape=(3,), dtype=float32),
            'qpos': Tensor(shape=(33,), dtype=float32),
            'qvel': Tensor(shape=(33,), dtype=float32),
            'target_pos': Tensor(shape=(3,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(46,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
chính sách	Tính năngDict
chính sách/fc0	Tính năngDict
chính sách/fc0/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc0/trọng lượng	Tenxơ	(46, 256)	phao32
chính sách/fc1	Tính năngDict
chính sách/fc1/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc1/trọng lượng	Tenxơ	(256, 256)	phao32
chính sách/last_fc	Tính năngDict
chính sách/last_fc/thiên vị	Tenxơ	(26,)	phao32
chính sách/last_fc/trọng lượng	Tenxơ	(256, 26)	phao32
chính sách/phi tuyến tính	Tenxơ		sợi dây
chính sách/output_distribution	Tenxơ		sợi dây
bước	Tập dữ liệu
bước/hành động	Tenxơ	(26,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/board_pos	Tenxơ	(3,)	phao32
bước/thông tin/qpos	Tenxơ	(33,)	phao32
bước/thông tin/qvel	Tenxơ	(33,)	phao32
bước/thông tin/target_pos	Tenxơ	(3,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(46,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_adroit_hammer/v1-chuyên gia

Kích thước tải xuống : 531.24 MiB
Kích thước tập dữ liệu : 843.54 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	5.000

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'policy': FeaturesDict({
        'fc0': FeaturesDict({
            'bias': Tensor(shape=(32,), dtype=float32),
            'weight': Tensor(shape=(32, 46), dtype=float32),
        }),
        'fc1': FeaturesDict({
            'bias': Tensor(shape=(32,), dtype=float32),
            'weight': Tensor(shape=(32, 32), dtype=float32),
        }),
        'last_fc': FeaturesDict({
            'bias': Tensor(shape=(26,), dtype=float32),
            'weight': Tensor(shape=(26, 32), dtype=float32),
        }),
        'last_fc_log_std': FeaturesDict({
            'bias': Tensor(shape=(26,), dtype=float32),
            'weight': Tensor(shape=(26, 32), dtype=float32),
        }),
        'nonlinearity': string,
        'output_distribution': string,
    }),
    'steps': Dataset({
        'action': Tensor(shape=(26,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_std': Tensor(shape=(26,), dtype=float32),
            'action_mean': Tensor(shape=(26,), dtype=float32),
            'board_pos': Tensor(shape=(3,), dtype=float32),
            'qpos': Tensor(shape=(33,), dtype=float32),
            'qvel': Tensor(shape=(33,), dtype=float32),
            'target_pos': Tensor(shape=(3,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(46,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
chính sách	Tính năngDict
chính sách/fc0	Tính năngDict
chính sách/fc0/thiên vị	Tenxơ	(32,)	phao32
chính sách/fc0/trọng lượng	Tenxơ	(32, 46)	phao32
chính sách/fc1	Tính năngDict
chính sách/fc1/thiên vị	Tenxơ	(32,)	phao32
chính sách/fc1/trọng lượng	Tenxơ	(32, 32)	phao32
chính sách/last_fc	Tính năngDict
chính sách/last_fc/thiên vị	Tenxơ	(26,)	phao32
chính sách/last_fc/trọng lượng	Tenxơ	(26, 32)	phao32
chính sách/last_fc_log_std	Tính năngDict
chính sách/last_fc_log_std/thiên vị	Tenxơ	(26,)	phao32
chính sách/last_fc_log_std/trọng lượng	Tenxơ	(26, 32)	phao32
chính sách/phi tuyến tính	Tenxơ		sợi dây
chính sách/output_distribution	Tenxơ		sợi dây
bước	Tập dữ liệu
bước/hành động	Tenxơ	(26,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_std	Tenxơ	(26,)	phao32
bước/thông tin/hành động_mean	Tenxơ	(26,)	phao32
bước/thông tin/board_pos	Tenxơ	(3,)	phao32
bước/thông tin/qpos	Tenxơ	(33,)	phao32
bước/thông tin/qvel	Tenxơ	(33,)	phao32
bước/thông tin/target_pos	Tenxơ	(3,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(46,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):