TFDS artık Kruvasan 🥐 formatını destekliyor! Daha fazlasını öğrenmek için belgeleri okuyun.

Bu sayfa, Cloud Translation API ile çevrilmiştir.

indic_glue

Referanslar:

wnli.en

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wnli.en')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	146
`'train'`	635
`'validation'`	71

Özellikler :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

wnli.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wnli.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	146
`'train'`	635
`'validation'`	71

Özellikler :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

wnli.gu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wnli.gu')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	146
`'train'`	635
`'validation'`	71

Özellikler :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

wnli.mr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wnli.mr')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	146
`'train'`	635
`'validation'`	71

Özellikler :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

copa.tr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/copa.en')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	500
`'train'`	400
`'validation'`	100

Özellikler :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

copa.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/copa.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	449
`'train'`	362
`'validation'`	88

Özellikler :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

copa.gu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/copa.gu')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	448
`'train'`	362
`'validation'`	88

Özellikler :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

copa.mr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/copa.mr')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	449
`'train'`	362
`'validation'`	88

Özellikler :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

sna.bn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/sna.bn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


This dataset is a collection of Bengali News articles. The dataset is used for classifying articles into
5 different classes namely international, state, kolkata, entertainment and sports.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1411
`'train'`	11284
`'validation'`	1411

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 6,
        "names": [
            "kolkata",
            "state",
            "national",
            "sports",
            "entertainment",
            "international"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

csqa.as

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.as')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	2942

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.bn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.bn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	38845

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.gu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.gu')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	22861

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	35140

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.kn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.kn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	13666

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.ml

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.ml')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	26537

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.mr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.mr')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	11370

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.or

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.or')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1975

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.pa

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.pa')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5667

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.ta

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.ta')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	38590

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.te

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/csqa.te')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	41338

Özellikler :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wstp.as

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.as')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	626
`'train'`	5000
`'validation'`	625

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.bn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.bn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5948
`'train'`	47580
`'validation'`	5947

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.gu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.gu')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1251
`'train'`	10004
`'validation'`	1251

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5509
`'train'`	44069
`'validation'`	5509

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.kn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.kn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	4423
`'train'`	35379
`'validation'`	4422

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.ml

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.ml')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	3441
`'train'`	27527
`'validation'`	3441

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.mr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.mr')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1306
`'train'`	10446
`'validation'`	1306

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.veya

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.or')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	502
`'train'`	4015
`'validation'`	502

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.pa

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.pa')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1097
`'train'`	8772
`'validation'`	1097

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.ta

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.ta')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	6118
`'train'`	48940
`'validation'`	6117

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.te

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wstp.te')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	10000
`'train'`	80000
`'validation'`	10000

Özellikler :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

inltkh.gu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/inltkh.gu')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	659
`'train'`	5269
`'validation'`	659

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.ml

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/inltkh.ml')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	630
`'train'`	5036
`'validation'`	630

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.mr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/inltkh.mr')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1210
`'train'`	9672
`'validation'`	1210

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.ta

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/inltkh.ta')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	669
`'train'`	5346
`'validation'`	669

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.te

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/inltkh.te')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	541
`'train'`	4328
`'validation'`	541

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

bbca.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/bbca.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


This release consists of 4335 Hindi documents with tags from the BBC Hindi News website.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	866
`'train'`	3467

Özellikler :

{
    "label": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-bn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-bn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5522

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-gu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-gu')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	6463

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5169

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-ml

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-ml')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	4886

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-mr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-mr')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5760

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-veya

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-or')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	752

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-ta

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-ta')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5637

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-te

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-te')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	5049

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-ur

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-ur')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1006

Özellikler :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

iitp-mr.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/iitp-mr.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


IIT Patna Product Reviews: Sentiment analysis corpus for product reviews posted in Hindi.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	310
`'train'`	2480
`'validation'`	310

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "negative",
            "neutral",
            "positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

iitp-pr.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/iitp-pr.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


IIT Patna Product Reviews: Sentiment analysis corpus for product reviews posted in Hindi.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	523
`'train'`	4182
`'validation'`	523

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "negative",
            "neutral",
            "positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

acta-sc.te

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/actsa-sc.te')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


ACTSA Corpus: Sentiment analysis corpus for Telugu sentences.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	541
`'train'`	4328
`'validation'`	541

Özellikler :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 2,
        "names": [
            "positive",
            "negative"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

md.merhaba

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/md.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Hindi Discourse Analysis dataset is a corpus for analyzing discourse modes present in its sentences.
It contains sentences from stories written by 11 famous authors from the 20th Century. 4-5 stories by
each author have been selected which were available in the public domain resulting in a collection of 53 stories.
Most of these short stories were originally written in Hindi but some of them were written in other Indian languages
and later translated to Hindi.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	997
`'train'`	7974
`'validation'`	997

Özellikler :

{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "discourse_mode": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "story_number": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

wiki-ner.as

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.as')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	160
`'train'`	1021
`'validation'`	157

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.bn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.bn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	2690
`'train'`	20223
`'validation'`	2985

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.gu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.gu')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	255
`'train'`	2343
`'validation'`	297

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.hi

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.hi')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1256
`'train'`	9463
`'validation'`	1114

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.kn

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.kn')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	476
`'train'`	2679
`'validation'`	412

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.ml

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.ml')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	2042
`'train'`	15620
`'validation'`	2067

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.mr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.mr')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1329
`'train'`	12151
`'validation'`	1498

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.or

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.or')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	153
`'train'`	1077
`'validation'`	132

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.pa

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.pa')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	179
`'train'`	1408
`'validation'`	186

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.ta

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.ta')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	2611
`'train'`	20466
`'validation'`	2586

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.te

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:indic_glue/wiki-ner.te')

Tanım :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Lisans : Bilinen lisans yok
Sürüm : 1.0.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1110
`'train'`	7978
`'validation'`	841

Özellikler :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}