bot_adversarial_dialogue

  • Description:

Bot Adversarial Dialogue Dataset.

Dialogue datasets labeled with offensiveness from Bot Adversarial Dialogue task. The dialogues were collected by asking humans to adversarially talk to bots.

More details in the paper.

@misc{xu2021recipes,
      title={Recipes for Safety in Open-domain Chatbots},
      author={Jing Xu and Da Ju and Margaret Li and Y-Lan Boureau and Jason Weston and Emily Dinan},
      year={2021},
      eprint={2010.07079},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

bot_adversarial_dialogue/dialogue_datasets (default config)

  • Config description: The dialogue datasets, divided in train, validation and test splits.

  • Download size: 3.06 MiB

  • Dataset size: 23.38 MiB

  • Splits:

Split Examples
'test' 2,598
'train' 69,274
'valid' 7,002
  • Feature structure:
FeaturesDict({
    'bot_persona': Sequence(Text(shape=(), dtype=string)),
    'dialogue_id': float32,
    'episode_done': bool,
    'id': Text(shape=(), dtype=string),
    'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'round_id': float32,
    'speaker_to_eval': Text(shape=(), dtype=string),
    'text': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
bot_persona Sequence(Text) (None,) string The persona impersonated by the bot.
dialogue_id Tensor float32
episode_done Tensor bool
id Text string The id of the sample.
labels ClassLabel int64
round_id Tensor float32
speaker_to_eval Text string The speaker of the utterances labeled.
text Text string The utterance to classify.

bot_adversarial_dialogue/human_nonadv_safety_eval

  • Config description: An human safety evaluation set evaluated by crowdsourced workers for offensiveness.

  • Download size: 10.57 KiB

  • Dataset size: 34.55 KiB

  • Splits:

Split Examples
'test' 180
  • Feature structure:
FeaturesDict({
    'episode_done': bool,
    'id': Text(shape=(), dtype=string),
    'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'text': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
episode_done Tensor bool
id Text string The id of the sample.
labels ClassLabel int64
text Text string The utterance to classify.