cs_restaurantes

  • Descripción :

Conjunto de datos de conversión de datos a texto en checo en el dominio del restaurante. Las representaciones de significado de entrada contienen un tipo de acto de diálogo (informar, confirmar, etc.), espacios (comida, área, etc.) y sus valores. Se originó como una traducción del conjunto de datos de restaurantes de San Francisco en inglés de Wen et al. (2015).

Separar Ejemplos
'test' 842
'train' 3,569
'validation' 781
  • Estructura de características :
FeaturesDict({
    'delex_input_text': FeaturesDict({
        'table': Sequence({
            'column_header': string,
            'content': string,
            'row_number': int16,
        }),
    }),
    'delex_target_text': string,
    'input_text': FeaturesDict({
        'table': Sequence({
            'column_header': string,
            'content': string,
            'row_number': int16,
        }),
    }),
    'target_text': string,
})
  • Documentación de características :
Rasgo Clase Forma Tipo D Descripción
CaracterísticasDict
delex_input_text CaracterísticasDict
delex_input_text/tabla Secuencia
delex_input_text/table/column_header Tensor cuerda
delex_input_text/tabla/contenido Tensor cuerda
delex_input_text/table/row_number Tensor int16
delex_target_text Tensor cuerda
texto de entrada CaracterísticasDict
entrada_texto/tabla Secuencia
texto_de_entrada/tabla/encabezado_de_columna Tensor cuerda
entrada_texto/tabla/contenido Tensor cuerda
texto_de_entrada/tabla/número_de_fila Tensor int16
texto_objetivo Tensor cuerda
  • Cita :
@inproceedings{dusek_neural_2019,
        author = {Dušek, Ondřej and Jurčíček, Filip},
        title = {Neural {Generation} for {Czech}: {Data} and {Baselines} },
        shorttitle = {Neural {Generation} for {Czech} },
        url = {https://www.aclweb.org/anthology/W19-8670/},
        urldate = {2019-10-18},
        booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
        month = oct,
        address = {Tokyo, Japan},
        year = {2019},
        pages = {563--574},
        abstract = {We present the first dataset targeted at end-to-end NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach. While non-English NLG is under-explored in general, Czech, as a morphologically rich language, makes the task even harder: Since Czech requires inflecting named entities, delexicalization or copy mechanisms do not work out-of-the-box and lexicalizing the generated outputs is non-trivial. In our experiments, we present two different approaches to this this problem: (1) using a neural language model to select the correct inflected form while lexicalizing, (2) a two-step generation setup: our sequence-to-sequence model generates an interleaved sequence of lemmas and morphological tags, which are then inflected by a morphological generator.},
}