TFDS sekarang mendukung format Croissant 🥐 ! Baca dokumentasi untuk mengetahui lebih lanjut.

Halaman ini diterjemahkan oleh Cloud Translation API.

wiki_dpr

Referensi:

psgs_w100.nq.tepat

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.exact')

Keterangan :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Lisensi : Tidak ada lisensi yang diketahui
Versi : 0.0.0
Perpecahan :

Membelah	Contoh
`'train'`	21015300

Fitur :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.terkompresi

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.compressed')

Keterangan :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Lisensi : Tidak ada lisensi yang diketahui
Versi : 0.0.0
Perpecahan :

Membelah	Contoh
`'train'`	21015300

Fitur :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.no_index

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.no_index')

Keterangan :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Lisensi : Tidak ada lisensi yang diketahui
Versi : 0.0.0
Perpecahan :

Membelah	Contoh
`'train'`	21015300

Fitur :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.tepat

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.exact')

Keterangan :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Lisensi : Tidak ada lisensi yang diketahui
Versi : 0.0.0
Perpecahan :

Membelah	Contoh
`'train'`	21015300

Fitur :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.terkompresi

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.compressed')

Keterangan :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Lisensi : Tidak ada lisensi yang diketahui
Versi : 0.0.0
Perpecahan :

Membelah	Contoh
`'train'`	21015300

Fitur :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.no_index

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.no_index')

Keterangan :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Lisensi : Tidak ada lisensi yang diketahui
Versi : 0.0.0
Perpecahan :

Membelah	Contoh
`'train'`	21015300

Fitur :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}