TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

flores

Description:

Evaluation datasets for low-resource machine translation: Nepali-English and Sinhala-English.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/facebookresearch/flores/
Source code: tfds.translate.Flores
Versions:
- 1.2.0 (default): No release notes.
Download size: 1.47 MiB
Auto-cached (documentation): Yes
Figure (tfds.show_examples): Not supported.
Citation:

@misc{guzmn2019new,
    title={Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English},
    author={Francisco Guzman and Peng-Jen Chen and Myle Ott and Juan Pino and Guillaume Lample and Philipp Koehn and Vishrav Chaudhary and Marc'Aurelio Ranzato},
    year={2019},
    eprint={1902.01382},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

flores/neen (default config)

Config description: Translation dataset from ne to en.
Dataset size: 1.89 MiB
Splits:

Split	Examples
`'test'`	2,835
`'validation'`	2,559

Feature structure:

Translation({
    'en': Text(shape=(), dtype=string),
    'ne': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	Translation
en	Text		string
ne	Text		string

Supervised keys (See as_supervised doc): ('ne', 'en')
Examples (tfds.as_dataframe):

flores/sien

Config description: Translation dataset from si to en.
Dataset size: 2.05 MiB
Splits:

Split	Examples
`'test'`	2,766
`'validation'`	2,898

Feature structure:

Translation({
    'en': Text(shape=(), dtype=string),
    'si': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	Translation
en	Text		string
si	Text		string

Supervised keys (See as_supervised doc): ('si', 'en')
Examples (tfds.as_dataframe):

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-12-06 UTC.