एनएससिंथो

विवरण :

NSynth डेटासेट एक ऑडियो डेटासेट है जिसमें ~300k संगीत नोट हैं, प्रत्येक एक अद्वितीय पिच, लय और लिफाफा के साथ है। प्रत्येक नोट को मानव मूल्यांकन और हेयुरिस्टिक एल्गोरिदम के संयोजन के आधार पर जानकारी के तीन अतिरिक्त टुकड़ों के साथ एनोटेट किया गया है: स्रोत, परिवार और गुण।

अतिरिक्त दस्तावेज़ीकरण : कोड वाले पेपर्स पर एक्सप्लोर करें
होमपेज : https://g.co/magenta/nsynth-dataset
स्रोत कोड : tfds.datasets.nsynth.Builder
संस्करण :
- 2.3.0 : डेसिबल में नया loudness_db फीचर (असामान्यीकृत)।
- 2.3.1 : F0 की गणना CREPE में सामान्यीकरण फिक्स के साथ की जाती है।
- 2.3.2 : ऑडियो फीचर का उपयोग करें।
- 2.3.3 (डिफ़ॉल्ट): F0 की गणना CREPE तरंग सामान्यीकरण ( https://github.com/marl/crepe/issues/49 ) में फिक्स के साथ की जाती है।
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
पर्यवेक्षित कुंजियाँ ( as_supervised doc देखें): None
चित्र ( tfds.show_examples ): समर्थित नहीं है।
उद्धरण :

@InProceedings{pmlr-v70-engel17a,
  title =    {Neural Audio Synthesis of Musical Notes with {W}ave{N}et Autoencoders},
  author =   {Jesse Engel and Cinjon Resnick and Adam Roberts and Sander Dieleman and Mohammad Norouzi and Douglas Eck and Karen Simonyan},
  booktitle =    {Proceedings of the 34th International Conference on Machine Learning},
  pages =    {1068--1077},
  year =     {2017},
  editor =   {Doina Precup and Yee Whye Teh},
  volume =   {70},
  series =   {Proceedings of Machine Learning Research},
  address =      {International Convention Centre, Sydney, Australia},
  month =    {06--11 Aug},
  publisher =    {PMLR},
  pdf =      {http://proceedings.mlr.press/v70/engel17a/engel17a.pdf},
  url =      {http://proceedings.mlr.press/v70/engel17a.html},
}

nsynth/पूर्ण (डिफ़ॉल्ट कॉन्फ़िगरेशन)

Config विवरण : पूर्ण NSynth डेटासेट को ट्रेन, वैध और परीक्षण सेट में विभाजित किया गया है, जिसमें ट्रेन सेट और वैध/परीक्षण सेट के बीच कोई उपकरण ओवरलैप नहीं होता है।
डाउनलोड का आकार : 73.07 GiB
डेटासेट का आकार : 73.09 GiB
विभाजन :

विभाजित करना	उदाहरण
`'test'`	4,096
`'train'`	289,205
`'valid'`	12,678

फ़ीचर संरचना :

FeaturesDict({
    'audio': Audio(shape=(64000,), dtype=float32),
    'id': string,
    'instrument': FeaturesDict({
        'family': ClassLabel(shape=(), dtype=int64, num_classes=11),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=1006),
        'source': ClassLabel(shape=(), dtype=int64, num_classes=3),
    }),
    'pitch': ClassLabel(shape=(), dtype=int64, num_classes=128),
    'qualities': FeaturesDict({
        'bright': bool,
        'dark': bool,
        'distortion': bool,
        'fast_decay': bool,
        'long_release': bool,
        'multiphonic': bool,
        'nonlinear_env': bool,
        'percussive': bool,
        'reverb': bool,
        'tempo-synced': bool,
    }),
    'velocity': ClassLabel(shape=(), dtype=int64, num_classes=128),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
ऑडियो	ऑडियो	(64000,)	फ्लोट32
पहचान	टेन्सर		डोरी
यंत्र	विशेषताएं डिक्ट
साधन / परिवार	क्लासलेबल		int64
उपकरण / लेबल	क्लासलेबल		int64
साधन / स्रोत	क्लासलेबल		int64
आवाज़ का उतार-चढ़ाव	क्लासलेबल		int64
गुण	विशेषताएं डिक्ट
गुण / उज्ज्वल	टेन्सर		बूल
गुण / अंधेरा	टेन्सर		बूल
गुण / विकृति	टेन्सर		बूल
गुण/तेज़_क्षय	टेन्सर		बूल
गुण/long_release	टेन्सर		बूल
गुण/मल्टीफोनिक	टेन्सर		बूल
गुण/nonlinear_env	टेन्सर		बूल
गुण/टक्कर	टेन्सर		बूल
गुण / गूँज	टेन्सर		बूल
गुण/टेम्पो-सिंक	टेन्सर		बूल
वेग	क्लासलेबल		int64

उदाहरण ( tfds.as_dataframe ):

nsynth/gansynth_subset

Config विवरण : MIDI पिच अंतराल [24, 84] में ध्वनिक उपकरणों तक सीमित NSynth डेटासेट। ट्रेन सेट और वैध/परीक्षण सेट के बीच उपकरणों में ओवरलैप (लेकिन सटीक नोट्स नहीं) वाले वैकल्पिक विभाजन का उपयोग करता है। इस वेरिएंट को मूल रूप से ICLR 2019 GANSynth पेपर ( https://arxiv.org/abs/1902.08710 ) में पेश किया गया था।
डाउनलोड का आकार : 73.08 GiB
डेटासेट का आकार : 20.73 GiB
विभाजन :

विभाजित करना	उदाहरण
`'test'`	8,518
`'train'`	60,788
`'valid'`	17,469

फ़ीचर संरचना :

FeaturesDict({
    'audio': Audio(shape=(64000,), dtype=float32),
    'id': string,
    'instrument': FeaturesDict({
        'family': ClassLabel(shape=(), dtype=int64, num_classes=11),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=1006),
        'source': ClassLabel(shape=(), dtype=int64, num_classes=3),
    }),
    'pitch': ClassLabel(shape=(), dtype=int64, num_classes=128),
    'qualities': FeaturesDict({
        'bright': bool,
        'dark': bool,
        'distortion': bool,
        'fast_decay': bool,
        'long_release': bool,
        'multiphonic': bool,
        'nonlinear_env': bool,
        'percussive': bool,
        'reverb': bool,
        'tempo-synced': bool,
    }),
    'velocity': ClassLabel(shape=(), dtype=int64, num_classes=128),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
ऑडियो	ऑडियो	(64000,)	फ्लोट32
पहचान	टेन्सर		डोरी
यंत्र	विशेषताएं डिक्ट
साधन / परिवार	क्लासलेबल		int64
उपकरण / लेबल	क्लासलेबल		int64
साधन / स्रोत	क्लासलेबल		int64
आवाज़ का उतार-चढ़ाव	क्लासलेबल		int64
गुण	विशेषताएं डिक्ट
गुण / उज्ज्वल	टेन्सर		बूल
गुण / अंधेरा	टेन्सर		बूल
गुण / विकृति	टेन्सर		बूल
गुण/तेज़_क्षय	टेन्सर		बूल
गुण/long_release	टेन्सर		बूल
गुण/मल्टीफोनिक	टेन्सर		बूल
गुण/nonlinear_env	टेन्सर		बूल
गुण/टक्कर	टेन्सर		बूल
गुण / गूँज	टेन्सर		बूल
गुण/टेम्पो-सिंक	टेन्सर		बूल
वेग	क्लासलेबल		int64

उदाहरण ( tfds.as_dataframe ):

nsynth/gansynth_subset.f0_and_loudness

Config विवरण : MIDI पिच अंतराल [24, 84] में ध्वनिक उपकरणों तक सीमित NSynth डेटासेट। ट्रेन सेट और वैध/परीक्षण सेट के बीच उपकरणों में ओवरलैप (लेकिन सटीक नोट्स नहीं) वाले वैकल्पिक विभाजन का उपयोग करता है। इस वेरिएंट को मूल रूप से ICLR 2019 GANSynth पेपर ( https://arxiv.org/abs/1902.08710 ) में पेश किया गया था। इस संस्करण में अतिरिक्त रूप से CREPE (किम एट अल।, 2018) और डेसिबल में ए-भारित अवधारणात्मक ज़ोर का उपयोग करके F0 के अनुमान शामिल हैं। दोनों सिग्नल 250Hz की फ्रेम दर पर प्रदान किए जाते हैं।
डाउनलोड का आकार : 73.08 GiB
डेटासेट का आकार : 22.03 GiB
विभाजन :

विभाजित करना	उदाहरण
`'test'`	8,518
`'train'`	60,788
`'valid'`	17,469

फ़ीचर संरचना :

FeaturesDict({
    'audio': Audio(shape=(64000,), dtype=float32),
    'f0': FeaturesDict({
        'confidence': Tensor(shape=(1000,), dtype=float32),
        'hz': Tensor(shape=(1000,), dtype=float32),
        'midi': Tensor(shape=(1000,), dtype=float32),
    }),
    'id': string,
    'instrument': FeaturesDict({
        'family': ClassLabel(shape=(), dtype=int64, num_classes=11),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=1006),
        'source': ClassLabel(shape=(), dtype=int64, num_classes=3),
    }),
    'loudness': FeaturesDict({
        'db': Tensor(shape=(1000,), dtype=float32),
    }),
    'pitch': ClassLabel(shape=(), dtype=int64, num_classes=128),
    'qualities': FeaturesDict({
        'bright': bool,
        'dark': bool,
        'distortion': bool,
        'fast_decay': bool,
        'long_release': bool,
        'multiphonic': bool,
        'nonlinear_env': bool,
        'percussive': bool,
        'reverb': bool,
        'tempo-synced': bool,
    }),
    'velocity': ClassLabel(shape=(), dtype=int64, num_classes=128),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
ऑडियो	ऑडियो	(64000,)	फ्लोट32
f0	विशेषताएं डिक्ट
f0 / आत्मविश्वास	टेन्सर	(1000,)	फ्लोट32
f0/हर्ट्ज	टेन्सर	(1000,)	फ्लोट32
f0/मिडी	टेन्सर	(1000,)	फ्लोट32
पहचान	टेन्सर		डोरी
यंत्र	विशेषताएं डिक्ट
साधन / परिवार	क्लासलेबल		int64
उपकरण / लेबल	क्लासलेबल		int64
साधन / स्रोत	क्लासलेबल		int64
प्रबलता	विशेषताएं डिक्ट
जोर/डीबी	टेन्सर	(1000,)	फ्लोट32
आवाज़ का उतार-चढ़ाव	क्लासलेबल		int64
गुण	विशेषताएं डिक्ट
गुण / उज्ज्वल	टेन्सर		बूल
गुण / अंधेरा	टेन्सर		बूल
गुण / विकृति	टेन्सर		बूल
गुण/तेज़_क्षय	टेन्सर		बूल
गुण/long_release	टेन्सर		बूल
गुण/मल्टीफोनिक	टेन्सर		बूल
गुण/nonlinear_env	टेन्सर		बूल
गुण/टक्कर	टेन्सर		बूल
गुण / गूँज	टेन्सर		बूल
गुण/टेम्पो-सिंक	टेन्सर		बूल
वेग	क्लासलेबल		int64

उदाहरण ( tfds.as_dataframe ):