xed_en_fi

참고자료:

en_annotated

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:xed_en_fi/en_annotated')
  • 설명 :
A multilingual fine-grained emotion dataset. The dataset consists of human annotated Finnish (25k) and English sentences (30k). Plutchik’s
core emotions are used to annotate the dataset with the addition of neutral to create a multilabel multiclass
dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to
show that XED performs on par with other similar datasets and is therefore a useful tool for
sentiment analysis and emotion detection.
  • 라이센스 : 라이센스: Creative Commons Attribution 4.0 국제 라이센스(CC-BY)
  • 버전 : 1.1.0
  • 분할 :
나뉘다
'train' 17528
  • 특징 :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "labels": {
        "feature": {
            "num_classes": 9,
            "names": [
                "neutral",
                "anger",
                "anticipation",
                "disgust",
                "fear",
                "joy",
                "sadness",
                "surprise",
                "trust"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

en_neutral

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:xed_en_fi/en_neutral')
  • 설명 :
A multilingual fine-grained emotion dataset. The dataset consists of human annotated Finnish (25k) and English sentences (30k). Plutchik’s
core emotions are used to annotate the dataset with the addition of neutral to create a multilabel multiclass
dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to
show that XED performs on par with other similar datasets and is therefore a useful tool for
sentiment analysis and emotion detection.
  • 라이센스 : 라이센스: Creative Commons Attribution 4.0 국제 라이센스(CC-BY)
  • 버전 : 1.1.0
  • 분할 :
나뉘다
'train' 9675
  • 특징 :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "labels": {
        "num_classes": 9,
        "names": [
            "neutral",
            "anger",
            "anticipation",
            "disgust",
            "fear",
            "joy",
            "sadness",
            "surprise",
            "trust"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

fi_annotated

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:xed_en_fi/fi_annotated')
  • 설명 :
A multilingual fine-grained emotion dataset. The dataset consists of human annotated Finnish (25k) and English sentences (30k). Plutchik’s
core emotions are used to annotate the dataset with the addition of neutral to create a multilabel multiclass
dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to
show that XED performs on par with other similar datasets and is therefore a useful tool for
sentiment analysis and emotion detection.
  • 라이센스 : 라이센스: Creative Commons Attribution 4.0 국제 라이센스(CC-BY)
  • 버전 : 1.1.0
  • 분할 :
나뉘다
'train' 14449
  • 특징 :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "labels": {
        "feature": {
            "num_classes": 9,
            "names": [
                "neutral",
                "anger",
                "anticipation",
                "disgust",
                "fear",
                "joy",
                "sadness",
                "surprise",
                "trust"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

완전 중립

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:xed_en_fi/fi_neutral')
  • 설명 :
A multilingual fine-grained emotion dataset. The dataset consists of human annotated Finnish (25k) and English sentences (30k). Plutchik’s
core emotions are used to annotate the dataset with the addition of neutral to create a multilabel multiclass
dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to
show that XED performs on par with other similar datasets and is therefore a useful tool for
sentiment analysis and emotion detection.
  • 라이센스 : 라이센스: Creative Commons Attribution 4.0 국제 라이센스(CC-BY)
  • 버전 : 1.1.0
  • 분할 :
나뉘다
'train' 10794
  • 특징 :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "labels": {
        "num_classes": 9,
        "names": [
            "neutral",
            "anger",
            "anticipation",
            "disgust",
            "fear",
            "joy",
            "sadness",
            "surprise",
            "trust"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}