TFDS supporte désormais le format Croissant 🥐 ! Lisez la documentation pour en savoir plus.

Cette page a été traduite par l'API Cloud Translation.

grand_corpus_espagnol

Les références:

CCR

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/JRC')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	3410620

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

EMEA

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/EMEA')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	1221233

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Voix mondiales

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/GlobalVoices')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	897075

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

BCE

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/ECB')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	1875738

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

DOGC

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/DOGC')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	10917053

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

tous_wikis

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/all_wikis')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	28109484

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

TED

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/TED')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	157910

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

multiUN

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/multiUN')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	13127490

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Europarl

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/Europarl')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	2174141

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ActualitésCommentaire11

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/NewsCommentary11')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	288771

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ONU

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/UN')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	74067

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

EUBookShop

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/EUBookShop')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	8214959

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ParaCrawl

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/ParaCrawl')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	15510649

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

OpenSubtitles2018

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/OpenSubtitles2018')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	213508602

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

DGT

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/DGT')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	3168368

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

combiné

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:large_spanish_corpus/combined')

Description :

The Large Spanish Corpus is a compilation of 15 unlabelled Spanish corpora spanning Wikipedia to European parliament notes. Each config contains the data corresponding to a different corpus. For example, "all_wiki" only includes examples from Spanish Wikipedia. By default, the config is set to "combined" which loads all the corpora; with this setting you can also specify the number of samples to return per corpus by configuring the "split" argument.

Licence : MIT
Version : 1.1.0
Divisions :

Diviser	Exemples
`'train'`	302656160

Caractéristiques :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}