- Description:
CREMA-D is an audio-visual data set for emotion recognition. The data set consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were collected. This release contains only the audio stream from the original audio-visual recording. The samples are splitted between train, validation and testing so that samples from each speaker belongs to exactly one split.
Additional Documentation: Explore on Papers With Code
Source code:
tfds.audio.CremaD
Versions:
1.0.0
(default): No release notes.
Download size:
579.25 MiB
Dataset size:
1.65 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
1,556 |
'train' |
5,144 |
'validation' |
738 |
- Feature structure:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=int64),
'label': ClassLabel(shape=(), dtype=int64, num_classes=6),
'speaker_id': string,
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
audio | Audio | (None,) | int64 | |
label | ClassLabel | int64 | ||
speaker_id | Tensor | string |
Supervised keys (See
as_supervised
doc):('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@article{cao2014crema,
title={ {CREMA-D}: Crowd-sourced emotional multimodal actors dataset},
author={Cao, Houwei and Cooper, David G and Keutmann, Michael K and Gur, Ruben C and Nenkova, Ani and Verma, Ragini},
journal={IEEE transactions on affective computing},
volume={5},
number={4},
pages={377--390},
year={2014},
publisher={IEEE}
}