- Description:
Bot Adversarial Dialogue Dataset.
Dialogue datasets labeled with offensiveness from Bot Adversarial Dialogue task. The dialogues were collected by asking humans to adversarially talk to bots.
More details in the paper.
Homepage: https://github.com/facebookresearch/ParlAI/tree/main/parlai/tasks/bot_adversarial_dialogue
Source code:
tfds.datasets.bot_adversarial_dialogue.Builder
Versions:
1.0.0
(default): Initial release.
Auto-cached (documentation): Yes
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@misc{xu2021recipes,
title={Recipes for Safety in Open-domain Chatbots},
author={Jing Xu and Da Ju and Margaret Li and Y-Lan Boureau and Jason Weston and Emily Dinan},
year={2021},
eprint={2010.07079},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
bot_adversarial_dialogue/dialogue_datasets (default config)
Config description: The dialogue datasets, divided in train, validation and test splits.
Download size:
3.06 MiB
Dataset size:
23.38 MiB
Splits:
Split | Examples |
---|---|
'test' |
2,598 |
'train' |
69,274 |
'valid' |
7,002 |
- Feature structure:
FeaturesDict({
'bot_persona': Sequence(Text(shape=(), dtype=string)),
'dialogue_id': float32,
'episode_done': bool,
'id': Text(shape=(), dtype=string),
'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
'round_id': float32,
'speaker_to_eval': Text(shape=(), dtype=string),
'text': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
bot_persona | Sequence(Text) | (None,) | string | The persona impersonated by the bot. |
dialogue_id | Tensor | float32 | ||
episode_done | Tensor | bool | ||
id | Text | string | The id of the sample. | |
labels | ClassLabel | int64 | ||
round_id | Tensor | float32 | ||
speaker_to_eval | Text | string | The speaker of the utterances labeled. | |
text | Text | string | The utterance to classify. |
- Examples (tfds.as_dataframe):
bot_adversarial_dialogue/human_nonadv_safety_eval
Config description: An human safety evaluation set evaluated by crowdsourced workers for offensiveness.
Download size:
10.57 KiB
Dataset size:
34.55 KiB
Splits:
Split | Examples |
---|---|
'test' |
180 |
- Feature structure:
FeaturesDict({
'episode_done': bool,
'id': Text(shape=(), dtype=string),
'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
'text': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
episode_done | Tensor | bool | ||
id | Text | string | The id of the sample. | |
labels | ClassLabel | int64 | ||
text | Text | string | The utterance to classify. |
- Examples (tfds.as_dataframe):