para_crawl

  • Description:

Web-Scale Parallel Corpora for Official European Languages.

Translation({
    'bg': Text(shape=(), dtype=tf.string),
    'en': Text(shape=(), dtype=tf.string),
})
@misc {paracrawl,
    title  = "ParaCrawl",
    year   = "2018",
    url    = "http://paracrawl.eu/download.html."
}

para_crawl/enbg_plain_text (default config)

  • Config description: Translation dataset from English to bg, uses encoder plain_text.
  • Splits:
Split Examples
'train' 1,039,885

para_crawl/encs_plain_text

  • Config description: Translation dataset from English to cs, uses encoder plain_text.
  • Splits:
Split Examples
'train' 2,981,949

para_crawl/enda_plain_text

  • Config description: Translation dataset from English to da, uses encoder plain_text.
  • Splits:
Split Examples
'train' 2,414,895

para_crawl/ende_plain_text

  • Config description: Translation dataset from English to de, uses encoder plain_text.
  • Splits:
Split Examples
'train' 16,264,448

para_crawl/enel_plain_text

  • Config description: Translation dataset from English to el, uses encoder plain_text.
  • Splits:
Split Examples
'train' 1,985,233

para_crawl/enes_plain_text

  • Config description: Translation dataset from English to es, uses encoder plain_text.
  • Splits:
Split Examples
'train' 21,987,267

para_crawl/enet_plain_text

  • Config description: Translation dataset from English to et, uses encoder plain_text.
  • Splits:
Split Examples
'train' 853,422

para_crawl/enfi_plain_text

  • Config description: Translation dataset from English to fi, uses encoder plain_text.
  • Splits:
Split Examples
'train' 2,156,069

para_crawl/enfr_plain_text

  • Config description: Translation dataset from English to fr, uses encoder plain_text.
  • Splits:
Split Examples
'train' 31,374,161

para_crawl/enga_plain_text

  • Config description: Translation dataset from English to ga, uses encoder plain_text.
  • Splits:
Split Examples
'train' 357,399

para_crawl/enhr_plain_text

  • Config description: Translation dataset from English to hr, uses encoder plain_text.
  • Splits:
Split Examples
'train' 1,002,053

para_crawl/enhu_plain_text

  • Config description: Translation dataset from English to hu, uses encoder plain_text.
  • Splits:
Split Examples
'train' 1,901,342

para_crawl/enit_plain_text

  • Config description: Translation dataset from English to it, uses encoder plain_text.
  • Splits:
Split Examples
'train' 12,162,239

para_crawl/enlt_plain_text

  • Config description: Translation dataset from English to lt, uses encoder plain_text.
  • Splits:
Split Examples
'train' 844,643

para_crawl/enlv_plain_text

  • Config description: Translation dataset from English to lv, uses encoder plain_text.
  • Splits:
Split Examples
'train' 553,060

para_crawl/enmt_plain_text

  • Config description: Translation dataset from English to mt, uses encoder plain_text.
  • Splits:
Split Examples
'train' 195,502

para_crawl/ennl_plain_text

  • Config description: Translation dataset from English to nl, uses encoder plain_text.
  • Splits:
Split Examples
'train' 5,659,268

para_crawl/enpl_plain_text

  • Config description: Translation dataset from English to pl, uses encoder plain_text.
  • Splits:
Split Examples
'train' 3,503,276

para_crawl/enpt_plain_text

  • Config description: Translation dataset from English to pt, uses encoder plain_text.
  • Splits:
Split Examples
'train' 8,141,940

para_crawl/enro_plain_text

  • Config description: Translation dataset from English to ro, uses encoder plain_text.
  • Splits:
Split Examples
'train' 1,952,043

para_crawl/ensk_plain_text

  • Config description: Translation dataset from English to sk, uses encoder plain_text.
  • Splits:
Split Examples
'train' 1,591,831

para_crawl/ensl_plain_text

  • Config description: Translation dataset from English to sl, uses encoder plain_text.
  • Splits:
Split Examples
'train' 660,161

para_crawl/ensv_plain_text

  • Config description: Translation dataset from English to sv, uses encoder plain_text.
  • Splits:
Split Examples
'train' 3,476,729