wiki_dpr

参考文献:

psgs_w100.nq.exact

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.exact')
  • 説明
This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.
  • ライセンス: 不明なライセンス
  • バージョン: 0.0.0
  • 分割:
スプリット
'train' 21015300
  • 特徴
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.compressed

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.compressed')
  • 説明
This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.
  • ライセンス: 不明なライセンス
  • バージョン: 0.0.0
  • 分割:
スプリット
'train' 21015300
  • 特徴
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.no_index

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.no_index')
  • 説明
This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.
  • ライセンス: 不明なライセンス
  • バージョン: 0.0.0
  • 分割:
スプリット
'train' 21015300
  • 特徴
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.exact

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.exact')
  • 説明
This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.
  • ライセンス: 不明なライセンス
  • バージョン: 0.0.0
  • 分割:
スプリット
'train' 21015300
  • 特徴
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.compressed

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.compressed')
  • 説明
This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.
  • ライセンス: 不明なライセンス
  • バージョン: 0.0.0
  • 分割:
スプリット
'train' 21015300
  • 特徴
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.no_index

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.no_index')
  • 説明
This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.
  • ライセンス: 不明なライセンス
  • バージョン: 0.0.0
  • 分割:
スプリット
'train' 21015300
  • 特徴
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}