real_有毒性_プロンプト

説明：

RealToxicityPrompts: 言語モデルにおける神経毒性変性の評価

RealToxicityPrompts は、さまざまな毒性の英語プロンプト (特に文の始まり) のコーパスです。これらのプロンプトは、自己回帰言語モデル (つまり、GPT-2) にコンテキストとして与えられ、補完を生成するために使用されることを目的としています。

詳細については、元の論文に記載されています。

ホームページ: https://github.com/allenai/real-有毒性-prompts
ソースコード: tfds.datasets.real_toxicity_prompts.Builder
バージョン:
- 1.0.0 (デフォルト): 初期リリース。
ダウンロードサイズ: 3.46 GiB
データセットのサイズ: 81.22 MiB
自動キャッシュ(ドキュメント): はい
分割:

スプリット	例
`'train'`	99,442

機能の構造:

FeaturesDict({
    'begin': int32,
    'challenging': bool,
    'continuation': FeaturesDict({
        'flirtation': float32,
        'identity_attack': float32,
        'insult': float32,
        'profanity': float32,
        'severe_toxicity': float32,
        'sexually_explicit': float32,
        'text': Text(shape=(), dtype=string),
        'threat': float32,
        'toxicity': float32,
    }),
    'end': int32,
    'filename': Text(shape=(), dtype=string),
    'prompt': FeaturesDict({
        'flirtation': float32,
        'identity_attack': float32,
        'insult': float32,
        'profanity': float32,
        'severe_toxicity': float32,
        'sexually_explicit': float32,
        'text': Text(shape=(), dtype=string),
        'threat': float32,
        'toxicity': float32,
    }),
})

機能ドキュメント:

特徴	クラス	Dタイプ
	特徴辞書
始める	テンソル	int32
挑戦的	テンソル	ブール
継続	特徴辞書
継続/浮気	テンソル	float32
継続/アイデンティティ攻撃	テンソル	float32
継続/侮辱	テンソル	float32
継続/冒涜	テンソル	float32
継続/重篤な毒性	テンソル	float32
続き/露骨な性的表現	テンソル	float32
続き・本文	文章	弦
継続/脅威	テンソル	float32
継続性/毒性	テンソル	float32
終わり	テンソル	int32
ファイル名	文章	弦
プロンプト	特徴辞書
プロンプト/浮気	テンソル	float32
プロンプト/アイデンティティ攻撃	テンソル	float32
催促/侮辱	テンソル	float32
プロンプト/冒涜的な表現	テンソル	float32
即時/重大な毒性	テンソル	float32
プロンプト/露骨な性的表現	テンソル	float32
プロンプト/テキスト	文章	弦
プロンプト/脅威	テンソル	float32
即効性/毒性	テンソル	float32

監視キー( as_supervised docを参照): None
図( tfds.show_examples ): サポートされていません。
例( tfds.as_dataframe ):

引用：

@article{gehman2020realtoxicityprompts,
  title={Realtoxicityprompts: Evaluating neural toxic degeneration in language models},
  author={Gehman, Samuel and Gururangan, Suchin and Sap, Maarten and Choi, Yejin and Smith, Noah A},
  journal={arXiv preprint arXiv:2009.11462},
  year={2020}
}