wiki_dialog

  • Description:

WikiDialog is a large dataset of synthetically generated information-seeking conversations. Each conversation in the dataset contains two speakers grounded in a passage from English Wikipedia: one speaker’s utterances consist of exact sentences from the passage; the other speaker is generated by a large language model.

Split Examples
'train' 11,264,129
'validation' 113,822
  • Feature structure:
FeaturesDict({
    'author_num': Sequence(int32),
    'passage': Text(shape=(), dtype=string),
    'pid': Text(shape=(), dtype=string),
    'sentences': Sequence(Text(shape=(), dtype=string)),
    'title': Text(shape=(), dtype=string),
    'utterances': Sequence(Text(shape=(), dtype=string)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
author_num Sequence(Tensor) (None,) int32
passage Text string
pid Text string
sentences Sequence(Text) (None,) string
title Text string
utterances Sequence(Text) (None,) string
  • Citation:
@inproceedings{dai2022dialoginpainting,
  title={Dialog Inpainting: Turning Documents to Dialogs},
  author={Dai, Zhuyun and Chaganty, Arun Tejasvi and Zhao, Vincent and Amini, Aida and Green, Mike and Rashid, Qazi and Guu, Kelvin},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2022},
  organization={PMLR}
}

wiki_dialog/OQ (default config)