참조:
다음 명령을 사용하여 TFDS에서 이 데이터세트를 로드합니다.
ds = tfds.load('huggingface:counter')
- 설명 :
The COrpus of Urdu News TExt Reuse (COUNTER) corpus contains 1200 documents with real examples of text reuse from the field of journalism. It has been manually annotated at document level with three levels of reuse: wholly derived, partially derived and non derived.
- 라이선스 : 코퍼스는 Creative Commons Attribution-NonCommercial-ShareAlike 4.0 국제 라이선스에 따라 라이선스가 부여됩니다.
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'train' | 600 |
- 특징 :
{
"source": {
"filename": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"headline": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"body": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"total_number_of_words": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"total_number_of_sentences": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"number_of_words_with_swr": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"newspaper": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"newsdate": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"domain": {
"num_classes": 5,
"names": [
"business",
"sports",
"national",
"foreign",
"showbiz"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"classification": {
"num_classes": 3,
"names": [
"wholly_derived",
"partially_derived",
"not_derived"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
}
},
"derived": {
"filename": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"headline": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"body": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"total_number_of_words": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"total_number_of_sentences": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"number_of_words_with_swr": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"newspaper": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"newsdate": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"domain": {
"num_classes": 5,
"names": [
"business",
"sports",
"national",
"foreign",
"showbiz"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"classification": {
"num_classes": 3,
"names": [
"wholly_derived",
"partially_derived",
"not_derived"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
}
}
}