Aprenda o que há de mais recente em aprendizado de máquina, IA generativa e muito mais no WiML Symposium 2023 Registre-se

Esta página foi traduzida pela API Cloud Translation.

cola

Descrição :

GLUE, o benchmark de Avaliação de Compreensão de Linguagem Geral ( https://gluebenchmark.com/ ) é uma coleção de recursos para treinamento, avaliação e análise de sistemas de compreensão de linguagem natural.

Documentação Adicional : Explore em Papers With Code
Código -fonte: tfds.text.Glue
Versões :
- 1.0.0 : Nova API de divisão ( https://tensorflow.org/datasets/splits )
- 1.0.1 : Atualize os links de URL mortos.
- 2.0.0 (padrão): atualize a fonte de dados para cola/qqp.
Cache automático ( documentação ): Sim
Chaves supervisionadas (Consulte as_supervised doc ): None
Figura ( tfds.show_examples ): Não compatível.

cola/cola (configuração padrão)

Descrição da configuração : O Corpus of Linguistic Acceptability consiste em julgamentos de aceitabilidade em inglês extraídos de livros e artigos de periódicos sobre teoria linguística. Cada exemplo é uma sequência de palavras anotadas com se é uma frase gramatical em inglês.
Página inicial : https://nyu-mll.github.io/CoLA/
Tamanho do download : 368.14 KiB
Tamanho do conjunto de dados : 965.49 KiB
Divisões :

Dividir	Exemplos
`'test'`	1.063
`'train'`	8.551
`'validation'`	1.043

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'sentence': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	ClassLabel	int64
frase	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@article{warstadt2018neural,
  title={Neural Network Acceptability Judgments},
  author={Warstadt, Alex and Singh, Amanpreet and Bowman, Samuel R},
  journal={arXiv preprint arXiv:1805.12471},
  year={2018}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/sst2

Descrição da configuração : O Stanford Sentiment Treebank consiste em frases de resenhas de filmes e anotações humanas de seus sentimentos. A tarefa é prever o sentimento de uma determinada frase. Usamos a divisão de classe bidirecional (positivo/negativo) e usamos apenas rótulos em nível de sentença.
Página inicial : https://nlp.stanford.edu/sentiment/index.html
Tamanho do download : 7.09 MiB
Tamanho do conjunto de dados : 7.22 MiB
Divisões :

Dividir	Exemplos
`'test'`	1.821
`'train'`	67.349
`'validation'`	872

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'sentence': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	ClassLabel	int64
frase	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@inproceedings{socher2013recursive,
  title={Recursive deep models for semantic compositionality over a sentiment treebank},
  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew and Potts, Christopher},
  booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
  pages={1631--1642},
  year={2013}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/mrpc

Descrição da configuração : O Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) é um corpus de pares de frases extraídos automaticamente de fontes de notícias online, com anotações humanas para verificar se as frases no par são semanticamente equivalentes.
Página inicial : https://www.microsoft.com/en-us/download/details.aspx?id=52398
Tamanho do download : 1.43 MiB
Tamanho do conjunto de dados : 1.74 MiB
Divisões :

Dividir	Exemplos
`'test'`	1.725
`'train'`	3.668
`'validation'`	408

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=string),
    'sentence2': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	ClassLabel	int64
frase1	Texto	corda
frase2	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@inproceedings{dolan2005automatically,
  title={Automatically constructing a corpus of sentential paraphrases},
  author={Dolan, William B and Brockett, Chris},
  booktitle={Proceedings of the Third International Workshop on Paraphrasing (IWP2005)},
  year={2005}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/qqp

Descrição da configuração : o conjunto de dados Quora Question Pairs2 é uma coleção de pares de perguntas do site de respostas a perguntas da comunidade Quora. A tarefa é determinar se um par de perguntas é semanticamente equivalente.
Página inicial : https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs
Tamanho do download : 39.76 MiB
Tamanho do conjunto de dados : 150.37 MiB
Divisões :

Dividir	Exemplos
`'test'`	390.965
`'train'`	363.846
`'validation'`	40.430

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'question1': Text(shape=(), dtype=string),
    'question2': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	ClassLabel	int64
questão 1	Texto	corda
Questão 2	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@online{WinNT,
  author = {Iyer, Shankar and Dandekar, Nikhil and Csernai, Kornel},
  title = {First Quora Dataset Release: Question Pairs},
  year = 2017,
  url = {https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs},
  urldate = {2019-04-03}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/stsb

Descrição da configuração : o benchmark de semelhança textual semântica (Cer et al., 2017) é uma coleção de pares de frases extraídas de manchetes de notícias, legendas de vídeo e imagem e dados de inferência de linguagem natural. Cada par é anotado por humanos com uma pontuação de similaridade de 0 a 5.
Página inicial : http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark
Tamanho do download : 784.05 KiB
Tamanho do conjunto de dados : 1.58 MiB
Divisões :

Dividir	Exemplos
`'test'`	1.379
`'train'`	5.749
`'validation'`	1.500

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': float32,
    'sentence1': Text(shape=(), dtype=string),
    'sentence2': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	tensor	float32
frase1	Texto	corda
frase2	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@article{cer2017semeval,
  title={Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation},
  author={Cer, Daniel and Diab, Mona and Agirre, Eneko and Lopez-Gazpio, Inigo and Specia, Lucia},
  journal={arXiv preprint arXiv:1708.00055},
  year={2017}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/mnli

Descrição da configuração : o Corpus de inferência de linguagem natural multigênero é uma coleção coletiva de pares de frases com anotações de implicação textual. Dada uma sentença de premissa e uma sentença de hipótese, a tarefa é prever se a premissa acarreta a hipótese (implicação), contradiz a hipótese (contradição) ou nenhuma das duas (neutra). As sentenças premissas são reunidas de dez fontes diferentes, incluindo discurso transcrito, ficção e relatórios do governo. Usamos o conjunto de teste padrão, para o qual obtivemos rótulos privados dos autores, e avaliamos as seções correspondentes (no domínio) e incompatíveis (entre domínios). Também usamos e recomendamos o corpus SNLI como exemplos de 550k de dados auxiliares de treinamento.
Homepage : http://www.nyu.edu/projects/bowman/multinli/
Tamanho do download : 298.29 MiB
Tamanho do conjunto de dados : 100.56 MiB
Divisões :

Dividir	Exemplos
`'test_matched'`	9.796
`'test_mismatched'`	9.847
`'train'`	392.702
`'validation_matched'`	9.815
`'validation_mismatched'`	9.832

Estrutura de recursos :

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'premise': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
hipótese	Texto	corda
idx	tensor	int32
etiqueta	ClassLabel	int64
premissa	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}
@article{bowman2015large,
  title={A large annotated corpus for learning natural language inference},
  author={Bowman, Samuel R and Angeli, Gabor and Potts, Christopher and Manning, Christopher D},
  journal={arXiv preprint arXiv:1508.05326},
  year={2015}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/mnli_mismatched

Descrição da configuração : a validação incompatível e as divisões de teste do MNLI. Consulte o BuilderConfig "mnli" para obter informações adicionais.
Homepage : http://www.nyu.edu/projects/bowman/multinli/
Tamanho do download : 298.29 MiB
Tamanho do conjunto de dados : 4.79 MiB
Divisões :

Dividir	Exemplos
`'test'`	9.847
`'validation'`	9.832

Estrutura de recursos :

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'premise': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
hipótese	Texto	corda
idx	tensor	int32
etiqueta	ClassLabel	int64
premissa	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}
@article{bowman2015large,
  title={A large annotated corpus for learning natural language inference},
  author={Bowman, Samuel R and Angeli, Gabor and Potts, Christopher and Manning, Christopher D},
  journal={arXiv preprint arXiv:1508.05326},
  year={2015}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/mnli_matched

Descrição da configuração : a validação correspondente e as divisões de teste do MNLI. Consulte o BuilderConfig "mnli" para obter informações adicionais.
Homepage : http://www.nyu.edu/projects/bowman/multinli/
Tamanho do download : 298.29 MiB
Tamanho do conjunto de dados : 4.58 MiB
Divisões :

Dividir	Exemplos
`'test'`	9.796
`'validation'`	9.815

Estrutura de recursos :

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'premise': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
hipótese	Texto	corda
idx	tensor	int32
etiqueta	ClassLabel	int64
premissa	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}
@article{bowman2015large,
  title={A large annotated corpus for learning natural language inference},
  author={Bowman, Samuel R and Angeli, Gabor and Potts, Christopher and Manning, Christopher D},
  journal={arXiv preprint arXiv:1508.05326},
  year={2015}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/qnli

Descrição da configuração : O conjunto de dados de resposta a perguntas de Stanford é um conjunto de dados de resposta a perguntas que consiste em pares de parágrafos de perguntas, em que uma das frases no parágrafo (retirada da Wikipedia) contém a resposta para a pergunta correspondente (escrita por um anotador). Convertemos a tarefa em classificação de pares de frases formando um par entre cada pergunta e cada frase no contexto correspondente e filtrando os pares com baixa sobreposição lexical entre a pergunta e a frase de contexto. A tarefa é determinar se a frase de contexto contém a resposta para a pergunta. Essa versão modificada da tarefa original remove a exigência de que o modelo selecione a resposta exata, mas também remove as suposições simplificadoras de que a resposta está sempre presente na entrada e que a sobreposição lexical é uma dica confiável.
Página inicial : https://rajpurkar.github.io/SQuAD-explorer/
Tamanho do download : 10.14 MiB
Tamanho do conjunto de dados : 32.99 MiB
Divisões :

Dividir	Exemplos
`'test'`	5.463
`'train'`	104.743
`'validation'`	5.463

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'question': Text(shape=(), dtype=string),
    'sentence': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	ClassLabel	int64
pergunta	Texto	corda
frase	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@article{rajpurkar2016squad,
  title={Squad: 100,000+ questions for machine comprehension of text},
  author={Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy},
  journal={arXiv preprint arXiv:1606.05250},
  year={2016}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/rte

Descrição da configuração : Os conjuntos de dados Recognizing Textual Entailment (RTE) vêm de uma série de desafios anuais de vinculação textual. Combinamos os dados de RTE1 (Dagan et al., 2006), RTE2 (Bar Haim et al., 2006), RTE3 (Giampiccolo et al., 2007) e RTE5 (Bentivogli et al., 2009).4 Os exemplos são construído com base em notícias e texto da Wikipédia. Convertemos todos os conjuntos de dados em uma divisão de duas classes, onde, para conjuntos de dados de três classes, reduzimos o neutro e a contradição para não implicação, para consistência.
Página inicial : https://aclweb.org/aclwiki/Recognizing_Textual_Entailment
Tamanho do download : 680.81 KiB
Tamanho do conjunto de dados : 2.15 MiB
Divisões :

Dividir	Exemplos
`'test'`	3.000
`'train'`	2.490
`'validation'`	277

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=string),
    'sentence2': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	ClassLabel	int64
frase1	Texto	corda
frase2	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@inproceedings{dagan2005pascal,
  title={The PASCAL recognising textual entailment challenge},
  author={Dagan, Ido and Glickman, Oren and Magnini, Bernardo},
  booktitle={Machine Learning Challenges Workshop},
  pages={177--190},
  year={2005},
  organization={Springer}
}
@inproceedings{bar2006second,
  title={The second pascal recognising textual entailment challenge},
  author={Bar-Haim, Roy and Dagan, Ido and Dolan, Bill and Ferro, Lisa and Giampiccolo, Danilo and Magnini, Bernardo and Szpektor, Idan},
  booktitle={Proceedings of the second PASCAL challenges workshop on recognising textual entailment},
  volume={6},
  number={1},
  pages={6--4},
  year={2006},
  organization={Venice}
}
@inproceedings{giampiccolo2007third,
  title={The third pascal recognizing textual entailment challenge},
  author={Giampiccolo, Danilo and Magnini, Bernardo and Dagan, Ido and Dolan, Bill},
  booktitle={Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing},
  pages={1--9},
  year={2007},
  organization={Association for Computational Linguistics}
}
@inproceedings{bentivogli2009fifth,
  title={The Fifth PASCAL Recognizing Textual Entailment Challenge.},
  author={Bentivogli, Luisa and Clark, Peter and Dagan, Ido and Giampiccolo, Danilo},
  booktitle={TAC},
  year={2009}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/wnli

Descrição da configuração : O Winograd Schema Challenge (Levesque et al., 2011) é uma tarefa de compreensão de leitura na qual um sistema deve ler uma frase com um pronome e selecionar o referente desse pronome em uma lista de opções. Os exemplos são construídos manualmente para frustrar métodos estatísticos simples: cada um depende de informações contextuais fornecidas por uma única palavra ou frase na frase. Para converter o problema em classificação de pares de sentenças, construímos pares de sentenças substituindo o pronome ambíguo por cada referente possível. A tarefa é prever se a frase com o pronome substituído é vinculada à frase original. Usamos um pequeno conjunto de avaliação composto por novos exemplos derivados de livros de ficção que foram compartilhados em particular pelos autores do corpus original. Enquanto o conjunto de treinamento incluído é equilibrado entre duas classes, o conjunto de teste é desequilibrado entre elas (65% não implicação). Além disso, devido a uma peculiaridade de dados, o conjunto de desenvolvimento é contraditório: as hipóteses às vezes são compartilhadas entre os exemplos de treinamento e desenvolvimento; portanto, se um modelo memorizar os exemplos de treinamento, eles preverão o rótulo errado no exemplo de conjunto de desenvolvimento correspondente. Assim como no QNLI, cada exemplo é avaliado separadamente, portanto não há uma correspondência sistemática entre a pontuação de um modelo nesta tarefa e sua pontuação na tarefa original não convertida. Chamamos o conjunto de dados convertido WNLI (Winograd NLI).
Página inicial : https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html
Tamanho do download : 28.32 KiB
Tamanho do conjunto de dados : 198.88 KiB
Divisões :

Dividir	Exemplos
`'test'`	146
`'train'`	635
`'validation'`	71

Estrutura de recursos :

FeaturesDict({
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=string),
    'sentence2': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
idx	tensor	int32
etiqueta	ClassLabel	int64
frase1	Texto	corda
frase2	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@inproceedings{levesque2012winograd,
  title={The winograd schema challenge},
  author={Levesque, Hector and Davis, Ernest and Morgenstern, Leora},
  booktitle={Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning},
  year={2012}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola/machado

Descrição da configuração : um conjunto de dados de avaliação com curadoria manual para análise refinada do desempenho do sistema em uma ampla gama de fenômenos linguísticos. Este conjunto de dados avalia a compreensão de sentenças por meio de problemas de Inferência de Linguagem Natural (NLI). Use um modelo treinado no MulitNLI para produzir previsões para este conjunto de dados.
Página inicial : https://gluebenchmark.com/diagnostics
Tamanho do download : 217.05 KiB
Tamanho do conjunto de dados : 299.16 KiB
Divisões :

Dividir	Exemplos
`'test'`	1.104

Estrutura de recursos :

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'idx': int32,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'premise': Text(shape=(), dtype=string),
})

Documentação do recurso:

Característica	Classe	Tipo D
	RecursosDict
hipótese	Texto	corda
idx	tensor	int32
etiqueta	ClassLabel	int64
premissa	Texto	corda

Exemplos ( tfds.as_dataframe ):

Citação :

@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

cola Mantenha tudo organizado com as coleções Salve e categorize o conteúdo com base nas suas preferências.