TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

dices

Description:

The Diversity in Conversational AI Evaluation for Safety (DICES) dataset

Machine learning approaches are often trained and evaluated with datasets that require a clear separation between positive and negative examples. This approach overly simplifies the natural subjectivity present in many tasks and content items. It also obscures the inherent diversity in human perceptions and opinions. Often tasks that attempt to preserve the variance in content and diversity in humans are quite expensive and laborious. To fill in this gap and facilitate more in-depth model performance analyses we propose the DICES dataset - a unique dataset with diverse perspectives on safety of AI generated conversations. We focus on the task of safety evaluation of conversational AI systems. The DICES dataset contains detailed demographics information about each rater, extremely high replication of unique ratings per conversation to ensure statistical significance of further analyses and encodes rater votes as distributions across different demographics to allow for in-depth explorations of different rating aggregation strategies.

This dataset is well suited to observe and measure variance, ambiguity and diversity in the context of safety of conversational AI. The dataset is accompanied by a paper describing a set of metrics that show how rater diversity influences the safety perception of raters from different geographic regions, ethnicity groups, age groups and genders. The goal of the DICES dataset is to be used as a shared benchmark for safety evaluation of conversational AI systems.

CONTENT WARNING: This dataset contains adversarial examples of conversations that may be offensive.

Homepage: https://github.com/google-research-datasets/dices-dataset
Source code: tfds.datasets.dices.Builder
Versions:
- 1.0.0 (default): Initial release.
Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@article{aroyo2024dices,
  title={ {DICES} dataset: Diversity in conversational {AI} evaluation for safety},
  author={Aroyo, Lora and Taylor, Alex and Diaz, Mark and Homan, Christopher and Parrish, Alicia and Serapio-Garc{\'\i}a, Gregory and Prabhakaran, Vinodkumar and Wang, Ding},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

dices/350 (default config)

Config description: Dataset 350 contains 350 conversations rated by a diverse rater pool of 123 unique raters. Each conversation is rated with five safety top-level categories and one overall comprehension question of the conversation. Raters were recruited were balanced by gender (man or woman), race/ethnicity (White, Black, Latine, Asian, Multiracial) and each rater rated all items. Each rater rated all conversations. Each conversation has 123 unique ratings. Total number of rows in this dataset is 43050.
Download size: 29.70 MiB
Dataset size: 74.43 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'train'`	43,050

Feature structure:

FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_dangerous_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_derogation_of_boat': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_insensitive': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_medical_legal_financial_or_relationship_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscene_and_profane': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_sexually_suggestive_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_beliefs': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_gender_sexual_orientation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_inherited_attributes': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_status': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q5_political_affiliation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_making_endorsement': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_polarizing_topics': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'answer_time_ms': Scalar(shape=(), dtype=int64, description=Amount of time spent by each rater on each safety annotation question.),
    'answer_timestamp': Scalar(shape=(), dtype=int64, description=Time when each conversation was rated by each rater.),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
    'safety_gold': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'safety_gold_reason': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	FeaturesDict
Q1_whole_conversation_evaluation	ClassLabel		int64	Rating about the understandability of a conversation.
Q2_harmful_content_dangerous_content	ClassLabel		int64
Q2_harmful_content_derogation_of_boat	ClassLabel		int64
Q2_harmful_content_insensitive	ClassLabel		int64
Q2_harmful_content_medical_legal_financial_or_relationship_advice	ClassLabel		int64
Q2_harmful_content_obscene_and_profane	ClassLabel		int64
Q2_harmful_content_other	ClassLabel		int64
Q2_harmful_content_overall	ClassLabel		int64
Q2_harmful_content_personal_information	ClassLabel		int64
Q2_harmful_content_regulated_goods	ClassLabel		int64
Q2_harmful_content_sexually_suggestive_content	ClassLabel		int64
Q3_bias_other	ClassLabel		int64
Q3_bias_overall	ClassLabel		int64
Q3_bias_targeting_beliefs	ClassLabel		int64
Q3_bias_targeting_gender_sexual_orientation	ClassLabel		int64
Q3_bias_targeting_inherited_attributes	ClassLabel		int64
Q3_bias_targeting_status	ClassLabel		int64
Q4_misinformation	ClassLabel		int64
Q5_political_affiliation	ClassLabel		int64
Q6_policy_guidelines_making_endorsement	ClassLabel		int64
Q6_policy_guidelines_other	ClassLabel		int64
Q6_policy_guidelines_overall	ClassLabel		int64
Q6_policy_guidelines_polarizing_topics	ClassLabel		int64
Q_overall	ClassLabel		int64
answer_time_ms	Scalar		int64	Amount of time spent by each rater on each safety annotation question.
answer_timestamp	Scalar		int64	Time when each conversation was rated by each rater.
context	Text		string	The conversation turns before the final chatbot response.
degree_of_harm	ClassLabel		int64	Hand-annotated rating of severity of safety risk.
harm_type	Sequence(ClassLabel)	(None,)	int64	Hand-annotated harm topic(s) of conversation.
id	Scalar		int64	Numerical identifier for each row, representing all ratings by a single rater to a single conversation.
item_id	Scalar		int64	Numerical identifier for each conversation.
phase	ClassLabel		int64	One of three distinct time periods.
rater_age	ClassLabel		int64	The age group of the rater.
rater_education	ClassLabel		int64	The education of the rater.
rater_gender	ClassLabel		int64	The gender of the rater.
rater_id	Scalar		int64	Numerical identifier for each rater.
rater_race	ClassLabel		int64	The race/ethnicity of the rater.
rater_raw_race	Text		string	The self-reported raw race/ethnicity of the rater, before simplification to five categories.
response	Text		string	The final chatbot response in the conversation.
safety_gold	ClassLabel		int64	The gold standard safety label provided by experts.
safety_gold_reason	Text		string	The reason(s) (if given) for the gold safety label provided by experts.

Examples (tfds.as_dataframe):

dices/990

Config description: Dataset 990 contains 990 conversations rated by a diverse rater pool of 173 unique raters. Each conversation is rated with three safety top-level categories and one overall conversation comprehension question. Raters were recruited so that the number of raters for each conversation was balanced by gender (Man, Woman) and locale (US, India). Each rater rated only a sample of the conversation. Each conversation has 60-70 unique ratings. Total number of rows in this dataset is 72103.
Download size: 48.06 MiB
Dataset size: 150.38 MiB
Auto-cached (documentation): Only when shuffle_files=False (train)
Splits:

Split	Examples
`'train'`	72,103

Feature structure:

FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_explicitly_sexual': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_financial_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_harm_to_people_or_animals': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_health_and_safety_claims': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_how_tos_about_harmful_activities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscenities_or_profanities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_use_of_harmful_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_violence': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_violent_or_gory': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_3_Racial_religious_or_other_slurs_and_stereotypes_that_incites_or_promote_hatred_based_on_any_of_the_attributes_noted_above': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_calling_individuals_or_groups_evil_corrupt_or_malicious': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_incites_hatred': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_insults_about_emotional_romantic_and_or_sexual_attraction': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_physically_or_mentally_inferior': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_subhuman_comparisons': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_superiority_or_subjugation_of_a_group': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_unfair_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_conspiracy_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_contradicts_expert_consensus': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_false_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_political': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_locale': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	FeaturesDict
Q1_whole_conversation_evaluation	ClassLabel		int64	Rating about the understandability of a conversation.
Q2_harmful_content_explicitly_sexual	ClassLabel		int64
Q2_harmful_content_financial_advice	ClassLabel		int64
Q2_harmful_content_harm_to_people_or_animals	ClassLabel		int64
Q2_harmful_content_health_and_safety_claims	ClassLabel		int64
Q2_harmful_content_how_tos_about_harmful_activities	ClassLabel		int64
Q2_harmful_content_obscenities_or_profanities	ClassLabel		int64
Q2_harmful_content_other	ClassLabel		int64
Q2_harmful_content_overall	ClassLabel		int64
Q2_harmful_content_personal_information	ClassLabel		int64
Q2_harmful_content_promotes_or_condones_use_of_harmful_regulated_goods	ClassLabel		int64
Q2_harmful_content_promotes_or_condones_violence	ClassLabel		int64
Q2_harmful_content_violent_or_gory	ClassLabel		int64
Q3_3_Racial_religious_or_other_slurs_and_stereotypes_that_incites_or_promote_hatred_based_on_any_of_the_attributes_noted_above	ClassLabel		int64
Q3_bias_calling_individuals_or_groups_evil_corrupt_or_malicious	ClassLabel		int64
Q3_bias_incites_hatred	ClassLabel		int64
Q3_bias_insults_about_emotional_romantic_and_or_sexual_attraction	ClassLabel		int64
Q3_bias_other	ClassLabel		int64
Q3_bias_physically_or_mentally_inferior	ClassLabel		int64
Q3_bias_subhuman_comparisons	ClassLabel		int64
Q3_bias_superiority_or_subjugation_of_a_group	ClassLabel		int64
Q3_unfair_bias_overall	ClassLabel		int64
Q4_misinformation_conspiracy_theories	ClassLabel		int64
Q4_misinformation_contradicts_expert_consensus	ClassLabel		int64
Q4_misinformation_false_theories	ClassLabel		int64
Q4_misinformation_other	ClassLabel		int64
Q4_misinformation_overall	ClassLabel		int64
Q4_misinformation_political	ClassLabel		int64
Q_overall	ClassLabel		int64
context	Text		string	The conversation turns before the final chatbot response.
degree_of_harm	ClassLabel		int64	Hand-annotated rating of severity of safety risk.
harm_type	Sequence(ClassLabel)	(None,)	int64	Hand-annotated harm topic(s) of conversation.
id	Scalar		int64	Numerical identifier for each row, representing all ratings by a single rater to a single conversation.
item_id	Scalar		int64	Numerical identifier for each conversation.
phase	ClassLabel		int64	One of three distinct time periods.
rater_age	ClassLabel		int64	The age group of the rater.
rater_education	ClassLabel		int64	The education of the rater.
rater_gender	ClassLabel		int64	The gender of the rater.
rater_id	Scalar		int64	Numerical identifier for each rater.
rater_locale	ClassLabel		int64	The locale of the rater.
rater_race	ClassLabel		int64	The race/ethnicity of the rater.
rater_raw_race	Text		string	The self-reported raw race/ethnicity of the rater, before simplification to five categories.
response	Text		string	The final chatbot response in the conversation.

Examples (tfds.as_dataframe):