ted_hrlr_translate

  • Description:

Data sets derived from TED talk transcripts for comparing similar language pairs where one is high resource and the other is low resource.

Translation({
    'az': Text(shape=(), dtype=tf.string),
    'en': Text(shape=(), dtype=tf.string),
})
@inproceedings{Ye2018WordEmbeddings,
  author  = {Ye, Qi and Devendra, Sachan and Matthieu, Felix and Sarguna, Padmanabhan and Graham, Neubig},
  title   = {When and Why are pre-trained word embeddings useful for Neural Machine Translation},
  booktitle = {HLT-NAACL},
  year    = {2018},
  }

ted_hrlr_translate/az_to_en (default config)

  • Config description: Translation dataset from az to en in plain text.
  • Splits:
Split Examples
'test' 903
'train' 5,946
'validation' 671

ted_hrlr_translate/aztr_to_en

  • Config description: Translation dataset from az_tr to en in plain text.
  • Splits:
Split Examples
'test' 903
'train' 188,396
'validation' 671

ted_hrlr_translate/be_to_en

  • Config description: Translation dataset from be to en in plain text.
  • Splits:
Split Examples
'test' 664
'train' 4,509
'validation' 248

ted_hrlr_translate/beru_to_en

  • Config description: Translation dataset from be_ru to en in plain text.
  • Splits:
Split Examples
'test' 664
'train' 212,614
'validation' 248

ted_hrlr_translate/es_to_pt

  • Config description: Translation dataset from es to pt in plain text.
  • Splits:
Split Examples
'test' 1,763
'train' 44,938
'validation' 1,016

ted_hrlr_translate/fr_to_pt

  • Config description: Translation dataset from fr to pt in plain text.
  • Splits:
Split Examples
'test' 1,494
'train' 43,873
'validation' 1,131

ted_hrlr_translate/gl_to_en

  • Config description: Translation dataset from gl to en in plain text.
  • Splits:
Split Examples
'test' 1,007
'train' 10,017
'validation' 682

ted_hrlr_translate/glpt_to_en

  • Config description: Translation dataset from gl_pt to en in plain text.
  • Splits:
Split Examples
'test' 1,007
'train' 61,802
'validation' 682

ted_hrlr_translate/he_to_pt

  • Config description: Translation dataset from he to pt in plain text.
  • Splits:
Split Examples
'test' 1,623
'train' 48,511
'validation' 1,145

ted_hrlr_translate/it_to_pt

  • Config description: Translation dataset from it to pt in plain text.
  • Splits:
Split Examples
'test' 1,669
'train' 46,259
'validation' 1,162

ted_hrlr_translate/pt_to_en

  • Config description: Translation dataset from pt to en in plain text.
  • Splits:
Split Examples
'test' 1,803
'train' 51,785
'validation' 1,193

ted_hrlr_translate/ru_to_en

  • Config description: Translation dataset from ru to en in plain text.
  • Splits:
Split Examples
'test' 5,476
'train' 208,106
'validation' 4,805

ted_hrlr_translate/ru_to_pt

  • Config description: Translation dataset from ru to pt in plain text.
  • Splits:
Split Examples
'test' 1,588
'train' 47,278
'validation' 1,184

ted_hrlr_translate/tr_to_en

  • Config description: Translation dataset from tr to en in plain text.
  • Splits:
Split Examples
'test' 5,029
'train' 182,450
'validation' 4,045