개념적_캡션

참고자료:

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:conceptual_captions')

설명 :

Image captioning dataset
The resulting dataset (version 1.1) has been split into Training, Validation, and Test splits. The Training split consists of 3,318,333 image-URL/caption pairs, with a total number of 51,201 total token types in the captions (i.e., total vocabulary). The average number of tokens per captions is 10.3 (standard deviation of 4.5), while the median is 9.0 tokens per caption. The Validation split consists of 15,840 image-URL/caption pairs, with similar statistics.

라이센스 : 알려진 라이센스 없음
버전 : 1.1.0
분할 :

나뉘다	예
`'train'`	3318333
`'validation'`	15840

특징 :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "caption": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

라벨이 지정되지 않음

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:conceptual_captions/unlabeled')

설명 :

Google's Conceptual Captions dataset has more than 3 million images, paired with natural-language captions.
In contrast with the curated style of the MS-COCO images, Conceptual Captions images and their raw descriptions are harvested from the web,
and therefore represent a wider variety of styles. The raw descriptions are harvested from the Alt-text HTML attribute associated with web images.
The authors developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness,
informativeness, fluency, and learnability of the resulting captions.

라이센스 : 데이터 세트는 어떤 목적으로든 자유롭게 사용할 수 있지만 Google LLC("Google")를 데이터 소스로 인정하는 것이 좋습니다. 데이터 세트는 명시적이든 묵시적이든 어떠한 보증도 없이 "있는 그대로" 제공됩니다. Google은 데이터 세트 사용으로 인해 발생하는 직간접적인 손해에 대해 모든 책임을 지지 않습니다.
버전 : 0.0.0
분할 :

나뉘다	예
`'train'`	3318333
`'validation'`	15840

특징 :

{
    "image_url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "caption": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

라벨이 붙은

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:conceptual_captions/labeled')

설명 :

Google's Conceptual Captions dataset has more than 3 million images, paired with natural-language captions.
In contrast with the curated style of the MS-COCO images, Conceptual Captions images and their raw descriptions are harvested from the web,
and therefore represent a wider variety of styles. The raw descriptions are harvested from the Alt-text HTML attribute associated with web images.
The authors developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness,
informativeness, fluency, and learnability of the resulting captions.

라이센스 : 데이터 세트는 어떤 목적으로든 자유롭게 사용할 수 있지만 Google LLC("Google")를 데이터 소스로 인정하는 것이 좋습니다. 데이터 세트는 명시적이든 묵시적이든 어떠한 보증도 없이 "있는 그대로" 제공됩니다. Google은 데이터 세트 사용으로 인해 발생하는 직간접적인 손해에 대해 모든 책임을 지지 않습니다.
버전 : 0.0.0
분할 :

나뉘다	예
`'train'`	2007090

특징 :

{
    "image_url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "caption": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "labels": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "MIDs": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "confidence_scores": {
        "feature": {
            "dtype": "float64",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}