TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

bot_adversarial_dialogue

Description:

Bot Adversarial Dialogue Dataset.

Dialogue datasets labeled with offensiveness from Bot Adversarial Dialogue task. The dialogues were collected by asking humans to adversarially talk to bots.

More details in the paper.

Homepage: https://github.com/facebookresearch/ParlAI/tree/main/parlai/tasks/bot_adversarial_dialogue
Source code: tfds.datasets.bot_adversarial_dialogue.Builder
Versions:
- 1.0.0 (default): Initial release.
Auto-cached (documentation): Yes
Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@misc{xu2021recipes,
      title={Recipes for Safety in Open-domain Chatbots},
      author={Jing Xu and Da Ju and Margaret Li and Y-Lan Boureau and Jason Weston and Emily Dinan},
      year={2021},
      eprint={2010.07079},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

bot_adversarial_dialogue/dialogue_datasets (default config)

Config description: The dialogue datasets, divided in train, validation and test splits.
Download size: 3.06 MiB
Dataset size: 23.38 MiB
Splits:

Split	Examples
`'test'`	2,598
`'train'`	69,274
`'valid'`	7,002

Feature structure:

FeaturesDict({
    'bot_persona': Sequence(Text(shape=(), dtype=string)),
    'dialogue_id': float32,
    'episode_done': bool,
    'id': Text(shape=(), dtype=string),
    'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'round_id': float32,
    'speaker_to_eval': Text(shape=(), dtype=string),
    'text': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	FeaturesDict
bot_persona	Sequence(Text)	(None,)	string	The persona impersonated by the bot.
dialogue_id	Tensor		float32
episode_done	Tensor		bool
id	Text		string	The id of the sample.
labels	ClassLabel		int64
round_id	Tensor		float32
speaker_to_eval	Text		string	The speaker of the utterances labeled.
text	Text		string	The utterance to classify.

Examples (tfds.as_dataframe):

bot_adversarial_dialogue/human_nonadv_safety_eval

Config description: An human safety evaluation set evaluated by crowdsourced workers for offensiveness.
Download size: 10.57 KiB
Dataset size: 34.55 KiB
Splits:

Split	Examples
`'test'`	180

Feature structure:

FeaturesDict({
    'episode_done': bool,
    'id': Text(shape=(), dtype=string),
    'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'text': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype	Description
	FeaturesDict
episode_done	Tensor	bool
id	Text	string	The id of the sample.
labels	ClassLabel	int64
text	Text	string	The utterance to classify.

Examples (tfds.as_dataframe):