wiki_lingua

Références:

arabe

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/arabic')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 9995
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

chinois

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/chinese')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 6541
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

tchèque

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/czech')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 2520
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

néerlandais

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/dutch')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 10862
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

Anglais

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/english')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 57945
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

français

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/french')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 21690
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

allemand

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/german')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 20103
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

hindi

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/hindi')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 3402
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

indonésien

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/indonesian')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 16308
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

italien

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/italian')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 17673
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

Japonais

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/japanese')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 4372
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

coréen

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/korean')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 4111
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

Portugais

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/portuguese')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 28143
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

russe

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/russian')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 18143
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

Espagnol

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/spanish')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 38795
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

thaïlandais

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/thai')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 5093
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

turc

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/turkish')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 1512
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

vietnamien

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_lingua/vietnamese')
  • Descriptif :
WikiLingua is a large-scale multilingual dataset for the evaluation of
crosslingual abstractive summarization systems. The dataset includes ~770k
article and summary pairs in 18 languages from WikiHow. The gold-standard
article-summary alignments across languages was done by aligning the images
that are used to describe each how-to step in an article.
  • Licence : CC BY-NC-SA 3.0
  • Version : 1.1.1
  • Fractionnements :
Diviser Exemples
'train' 6616
  • Caractéristiques :
{
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "article": {
        "feature": {
            "section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "document": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "summary": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_url": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "english_section_name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}