webvid

Stay organized with collections Save and categorize content based on your preferences.

  • Description:

WebVid is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.

WebVid-10M contains:

10.7M video-caption pairs. 52K total video hours.

  • Homepage: https://m-bain.github.io/webvid-dataset/

  • Source code: tfds.datasets.webvid.Builder

  • Versions:

    • 1.0.0 (default): Initial release.
  • Download size: Unknown size

  • Dataset size: Unknown size

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
    Follow the download instructions in https://m-bain.github.io/webvid-dataset/ to get the data. Place the csv files and the video directories in manual_dir/, such that mp4 files are placed in manual_dir/*_*/*.mp4.

  • Auto-cached (documentation): Unknown

  • Splits:

Split Examples
  • Feature structure:
FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
    'video': Video(Image(shape=(360, 640, 3), dtype=uint8)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
caption Text string
id Text string
url Text string
video Video(Image) (None, 360, 640, 3) uint8
@misc{bain2021frozen,
      title={Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval},
      author={Max Bain and Arsha Nagrani and Gül Varol and Andrew Zisserman},
      year={2021},
      eprint={2104.00650},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}