TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

webvid

Description:

WebVid is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.

WebVid-10M contains:

10.7M video-caption pairs. 52K total video hours.

Homepage: https://m-bain.github.io/webvid-dataset/
Source code: tfds.datasets.webvid.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: Unknown size
Dataset size: Unknown size
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
Follow the download instructions in https://m-bain.github.io/webvid-dataset/ to get the data. Place the csv files and the video directories in manual_dir/webvid, such that mp4 files are placed in manual_dir/webvid/*/*_*/*.mp4.

First directory typically being an arbitrary part directory (for sharded downloading), second directory is the page directory (two numbers around underscore), inside of which there is one or more mp4 files.

Auto-cached (documentation): Unknown
Splits:

Split	Examples

Feature structure:

FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
    'video': Video(Image(shape=(360, 640, 3), dtype=uint8)),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
caption	Text		string
id	Text		string
url	Text		string
video	Video(Image)	(None, 360, 640, 3)	uint8

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:

@misc{bain2021frozen,
      title={Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval},
      author={Max Bain and Arsha Nagrani and Gül Varol and Andrew Zisserman},
      year={2021},
      eprint={2104.00650},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}