webvid

  • Description:

WebVid is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.

WebVid-10M contains:

10.7M video-caption pairs. 52K total video hours.

  • Homepage: https://m-bain.github.io/webvid-dataset/

  • Source code: tfds.datasets.webvid.Builder

  • Versions:

    • 1.0.0 (default): Initial release.
  • Download size: Unknown size

  • Dataset size: Unknown size

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
    Follow the download instructions in https://m-bain.github.io/webvid-dataset/ to get the data. Place the csv files and the video directories in manual_dir/webvid, such that mp4 files are placed in manual_dir/webvid/*/*_*/*.mp4.

First directory typically being an arbitrary part directory (for sharded downloading), second directory is the page directory (two numbers around underscore), inside of which there is one or more mp4 files.

Split Examples
  • Feature structure:
FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
    'video': Video(Image(shape=(360, 640, 3), dtype=uint8)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
caption Text string
id Text string
url Text string
video Video(Image) (None, 360, 640, 3) uint8
@misc{bain2021frozen,
      title={Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval},
      author={Max Bain and Arsha Nagrani and Gül Varol and Andrew Zisserman},
      year={2021},
      eprint={2104.00650},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}