- Description:
WebVid is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.
WebVid-10M contains:
10.7M video-caption pairs. 52K total video hours.
Homepage: https://m-bain.github.io/webvid-dataset/
Source code:
tfds.datasets.webvid.Builder
Versions:
1.0.0
(default): Initial release.
Download size:
Unknown size
Dataset size:
Unknown size
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
Follow the download instructions in https://m-bain.github.io/webvid-dataset/ to get the data. Place the csv files and the video directories inmanual_dir/
, such that mp4 files are placed inmanual_dir/*_*/*.mp4
.Auto-cached (documentation): Unknown
Splits:
Split | Examples |
---|
- Feature structure:
FeaturesDict({
'caption': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
'video': Video(Image(shape=(360, 640, 3), dtype=uint8)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
caption | Text | string | ||
id | Text | string | ||
url | Text | string | ||
video | Video(Image) | (None, 360, 640, 3) | uint8 |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@misc{bain2021frozen,
title={Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval},
author={Max Bain and Arsha Nagrani and Gül Varol and Andrew Zisserman},
year={2021},
eprint={2104.00650},
archivePrefix={arXiv},
primaryClass={cs.CV}
}