TFDS agora suporta o formato Croissant 🥐 ! Leia a documentação para saber mais.

Esta página foi traduzida pela API Cloud Translation.

wiki_dpr

Referências:

psgs_w100.nq.exact

Use o seguinte comando para carregar este conjunto de dados no TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.exact')

Descrição :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licença : Nenhuma licença conhecida
Versão : 0.0.0
Divisões :

Dividir	Exemplos
`'train'`	21015300

Características :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.comprimido

Use o seguinte comando para carregar este conjunto de dados no TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.compressed')

Descrição :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licença : Nenhuma licença conhecida
Versão : 0.0.0
Divisões :

Dividir	Exemplos
`'train'`	21015300

Características :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.no_index

Use o seguinte comando para carregar este conjunto de dados no TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.no_index')

Descrição :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licença : Nenhuma licença conhecida
Versão : 0.0.0
Divisões :

Dividir	Exemplos
`'train'`	21015300

Características :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.exact

Use o seguinte comando para carregar este conjunto de dados no TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.exact')

Descrição :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licença : Nenhuma licença conhecida
Versão : 0.0.0
Divisões :

Dividir	Exemplos
`'train'`	21015300

Características :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.comprimido

Use o seguinte comando para carregar este conjunto de dados no TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.compressed')

Descrição :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licença : Nenhuma licença conhecida
Versão : 0.0.0
Divisões :

Dividir	Exemplos
`'train'`	21015300

Características :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.no_index

Use o seguinte comando para carregar este conjunto de dados no TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.no_index')

Descrição :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licença : Nenhuma licença conhecida
Versão : 0.0.0
Divisões :

Dividir	Exemplos
`'train'`	21015300

Características :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}