tf.raw_ops.AudioSpectrogram
bookmark_borderbookmark
Stay organized with collections
Save and categorize content based on your preferences.
Produces a visualization of audio data over time.
View aliases
Compat aliases for migration
See
Migration guide for
more details.
tf.compat.v1.raw_ops.AudioSpectrogram
tf.raw_ops.AudioSpectrogram(
input, window_size, stride, magnitude_squared=False, name=None
)
Spectrograms are a standard way of representing audio information as a series of
slices of frequency information, one slice for each window of time. By joining
these together into a sequence, they form a distinctive fingerprint of the sound
over time.
This op expects to receive audio data as an input, stored as floats in the range
-1 to 1, together with a window width in samples, and a stride specifying how
far to move the window between slices. From this it generates a three
dimensional output. The first dimension is for the channels in the input, so a
stereo audio input would have two here for example. The second dimension is time,
with successive frequency slices. The third dimension has an amplitude value for
each frequency during that time slice.
This means the layout when converted and saved as an image is rotated 90 degrees
clockwise from a typical spectrogram. Time is descending down the Y axis, and
the frequency decreases from left to right.
Each value in the result represents the square root of the sum of the real and
imaginary parts of an FFT on the current window of samples. In this way, the
lowest dimension represents the power of each frequency in the current window,
and adjacent windows are concatenated in the next dimension.
To get a more intuitive and visual look at what this operation does, you can run
tensorflow/examples/wav_to_spectrogram to read in an audio file and save out the
resulting spectrogram as a PNG image.
Args |
input
|
A Tensor of type float32 . Float representation of audio data.
|
window_size
|
An int .
How wide the input window is in samples. For the highest efficiency
this should be a power of two, but other values are accepted.
|
stride
|
An int .
How widely apart the center of adjacent sample windows should be.
|
magnitude_squared
|
An optional bool . Defaults to False .
Whether to return the squared magnitude or just the
magnitude. Using squared magnitude can avoid extra calculations.
|
name
|
A name for the operation (optional).
|
Returns |
A Tensor of type float32 .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf.raw_ops.AudioSpectrogram\n\n\u003cbr /\u003e\n\nProduces a visualization of audio data over time.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.raw_ops.AudioSpectrogram`](https://www.tensorflow.org/api_docs/python/tf/raw_ops/AudioSpectrogram)\n\n\u003cbr /\u003e\n\n tf.raw_ops.AudioSpectrogram(\n input, window_size, stride, magnitude_squared=False, name=None\n )\n\nSpectrograms are a standard way of representing audio information as a series of\nslices of frequency information, one slice for each window of time. By joining\nthese together into a sequence, they form a distinctive fingerprint of the sound\nover time.\n\nThis op expects to receive audio data as an input, stored as floats in the range\n-1 to 1, together with a window width in samples, and a stride specifying how\nfar to move the window between slices. From this it generates a three\ndimensional output. The first dimension is for the channels in the input, so a\nstereo audio input would have two here for example. The second dimension is time,\nwith successive frequency slices. The third dimension has an amplitude value for\neach frequency during that time slice.\n\nThis means the layout when converted and saved as an image is rotated 90 degrees\nclockwise from a typical spectrogram. Time is descending down the Y axis, and\nthe frequency decreases from left to right.\n\nEach value in the result represents the square root of the sum of the real and\nimaginary parts of an FFT on the current window of samples. In this way, the\nlowest dimension represents the power of each frequency in the current window,\nand adjacent windows are concatenated in the next dimension.\n\nTo get a more intuitive and visual look at what this operation does, you can run\ntensorflow/examples/wav_to_spectrogram to read in an audio file and save out the\nresulting spectrogram as a PNG image.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `input` | A `Tensor` of type `float32`. Float representation of audio data. |\n| `window_size` | An `int`. How wide the input window is in samples. For the highest efficiency this should be a power of two, but other values are accepted. |\n| `stride` | An `int`. How widely apart the center of adjacent sample windows should be. |\n| `magnitude_squared` | An optional `bool`. Defaults to `False`. Whether to return the squared magnitude or just the magnitude. Using squared magnitude can avoid extra calculations. |\n| `name` | A name for the operation (optional). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `Tensor` of type `float32`. ||\n\n\u003cbr /\u003e"]]