TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

lince

References:

lid_spaeng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/lid_spaeng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	8289
`'train'`	21030
`'validation'`	3332

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

lid_hineng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/lid_hineng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	1854
`'train'`	4823
`'validation'`	744

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

lid_msaea

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/lid_msaea')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	1663
`'train'`	8464
`'validation'`	1116

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

lid_nepeng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/lid_nepeng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	3228
`'train'`	8451
`'validation'`	1332

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

pos_spaeng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/pos_spaeng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	10720
`'train'`	27893
`'validation'`	4298

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "pos": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

pos_hineng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/pos_hineng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	299
`'train'`	1030
`'validation'`	160

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "pos": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

ner_spaeng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/ner_spaeng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	23527
`'train'`	33611
`'validation'`	10085

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

ner_msaea

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/ner_msaea')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	1110
`'train'`	10103
`'validation'`	1122

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

ner_hineng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/ner_hineng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	522
`'train'`	1243
`'validation'`	314

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

sa_spaeng

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:lince/sa_spaeng')

Description:

LinCE is a centralized Linguistic Code-switching Evaluation benchmark
(https://ritual.uh.edu/lince/) that contains data for training and evaluating
NLP systems on code-switching tasks.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'test'`	4736
`'train'`	12194
`'validation'`	1859

Features:

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "words": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "lid": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "sa": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}