- Description:
The dataset consists of 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22050Hz Mono 16-bit audio files in .wav format.
The genres are:
- blues
- classical
- country
- disco
- hiphop
- jazz
- metal
- pop
- reggae
rock
Additional Documentation: Explore on Papers With Code
Homepage: http://marsyas.info/index.html
Source code:
tfds.audio.gtzan.GTZAN
Versions:
1.0.0
(default): No release notes.
Download size:
1.14 GiB
Dataset size:
3.71 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,000 |
- Feature structure:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=int64),
'audio/filename': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
audio | Audio | (None,) | int64 | |
audio/filename | Text | string | ||
label | ClassLabel | int64 |
Supervised keys (See
as_supervised
doc):('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@misc{tzanetakis_essl_cook_2001,
author = "Tzanetakis, George and Essl, Georg and Cook, Perry",
title = "Automatic Musical Genre Classification Of Audio Signals",
url = "http://ismir2001.ismir.net/pdf/tzanetakis.pdf",
publisher = "The International Society for Music Information Retrieval",
year = "2001"
}