cmrc2018

참조:

다음 명령을 사용하여 TFDS에서 이 데이터세트를 로드합니다.

ds = tfds.load('huggingface:cmrc2018')
  • 설명 :
A Span-Extraction dataset for Chinese machine reading comprehension to add language
diversities in this area. The dataset is composed by near 20,000 real questions annotated
on Wikipedia paragraphs by human experts. We also annotated a challenge set which
contains the questions that need comprehensive understanding and multi-sentence
inference throughout the context.
  • 라이선스 : 알려진 라이선스 없음
  • 버전 : 0.1.0
  • 분할 :
나뉘다
'test' 1002
'train' 10142
'validation' 3219
  • 특징 :
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "context": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answers": {
        "feature": {
            "text": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "answer_start": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}