医療ダイアログ

参考文献:

jp

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:medical_dialog/en')

説明：

The MedDialog dataset (English) contains conversations (in English) between doctors and patients.It has 0.26 million dialogues. The data is continuously growing and more dialogues will be added. The raw dialogues are from healthcaremagic.com and icliniq.com.
All copyrights of the data belong to healthcaremagic.com and icliniq.com.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	229674

特徴：

{
    "file_name": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dialogue_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "dialogue_url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dialogue_turns": {
        "feature": {
            "speaker": {
                "num_classes": 2,
                "names": [
                    "Patient",
                    "Doctor"
                ],
                "id": null,
                "_type": "ClassLabel"
            },
            "utterance": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

zh

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:medical_dialog/zh')

説明：

The MedDialog dataset (English) contains conversations (in English) between doctors and patients.It has 0.26 million dialogues. The data is continuously growing and more dialogues will be added. The raw dialogues are from healthcaremagic.com and icliniq.com.
All copyrights of the data belong to healthcaremagic.com and icliniq.com.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	1921127

特徴：

{
    "file_name": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dialogue_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "dialogue_url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dialogue_turns": {
        "feature": {
            "speaker": {
                "num_classes": 2,
                "names": [
                    "\u75c5\u4eba",
                    "\u533b\u751f"
                ],
                "id": null,
                "_type": "ClassLabel"
            },
            "utterance": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

処理済み.en

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:medical_dialog/processed.en')

説明：

The MedDialog dataset (English) contains conversations (in English) between doctors and patients.It has 0.26 million dialogues. The data is continuously growing and more dialogues will be added. The raw dialogues are from healthcaremagic.com and icliniq.com.
All copyrights of the data belong to healthcaremagic.com and icliniq.com.

ライセンス: 著作権
バージョン: 2.0.0
分割:

スプリット	例
`'test'`	61
`'train'`	482
`'validation'`	60

特徴：

{
    "description": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "utterances": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

加工済み.zh

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:medical_dialog/processed.zh')

説明：

The MedDialog dataset (English) contains conversations (in English) between doctors and patients.It has 0.26 million dialogues. The data is continuously growing and more dialogues will be added. The raw dialogues are from healthcaremagic.com and icliniq.com.
All copyrights of the data belong to healthcaremagic.com and icliniq.com.

ライセンス: 著作権
バージョン: 2.0.0
分割:

スプリット	例
`'test'`	340754
`'train'`	2725989
`'validation'`	340748

特徴：

{
    "utterances": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}