References:
ttc4900
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:ttc4900/ttc4900')
- Description:
The data set is taken from kemik group
http://www.kemik.yildiz.edu.tr/
The data are pre-processed for the text categorization, collocations are found, character set is corrected, and so forth.
We named TTC4900 by mimicking the name convention of TTC 3600 dataset shared by the study http://journals.sagepub.com/doi/abs/10.1177/0165551515620551
If you use the dataset in a paper, please refer https://www.kaggle.com/savasy/ttc4900 as footnote and cite one of the papers as follows:
- A Comparison of Different Approaches to Document Representation in Turkish Language, SDU Journal of Natural and Applied Science, Vol 22, Issue 2, 2018
- A comparative analysis of text classification for Turkish language, Pamukkale University Journal of Engineering Science Volume 25 Issue 5, 2018
- A Knowledge-poor Approach to Turkish Text Categorization with a Comparative Analysis, Proceedings of CICLING 2014, Springer LNCS, Nepal, 2014.
- License: CC0: Public Domain
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
4900 |
- Features:
{
"category": {
"num_classes": 7,
"names": [
"siyaset",
"dunya",
"ekonomi",
"kultur",
"saglik",
"spor",
"teknoloji"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}