|View source on GitHub|
Reads and (optionally) parses avro files into a dataset.
tfio.experimental.columnar.make_avro_record_dataset( file_pattern, features, batch_size, reader_schema, reader_buffer_size=None, num_epochs=None, shuffle=True, shuffle_buffer_size=None, shuffle_seed=None, prefetch_buffer_size=tf.data.experimental.AUTOTUNE, num_parallel_reads=None, drop_final_batch=False )
Used in the notebooks
|Used in the tutorials|
Provides common functionality such as batching, optional parsing, shuffling,
and performing defaults.
file_pattern: List of files or patterns of avro file paths.
tf.io.gfile.glob for pattern rules.
features: A map of feature names mapped to feature information.
batch_size: An int representing the number of records to combine
in a single batch.
reader_schema: The reader schema.
reader_buffer_size: (Optional.) An int specifying the readers buffer
size in By. If None (the default) will use the default value from
num_epochs: (Optional.) An int specifying the number of times this
dataset is repeated. If None (the default), cycles through the
dataset forever. If set to None drops final batch.
shuffle: (Optional.) A bool that indicates whether the input
should be shuffled. Defaults to
shuffle_buffer_size: (Optional.) Buffer size to use for
shuffling. A large buffer size ensures better shuffling, but
increases memory usage and startup time. If not provided
assumes default value of 10,000 records. Note that the shuffle
size is measured in records.
shuffle_seed: (Optional.) Randomization seed to use for shuffling.
By default uses a pseudo-random seed.
prefetch_buffer_size: (Optional.) An int specifying the number of
feature batches to prefetch for performance improvement.
Defaults to auto-tune. Set to 0 to disable prefetching.
num_parallel_reads: (Optional.) Number of parallel
records to parse in parallel. Defaults to None(no parallelization).
drop_final_batch: (Optional.) Whether the last batch should be
dropped in case its size is smaller than
default behavior is not to drop the smaller batch.
A dataset, where each element matches the output of
except it will have an additional leading
batch_size-length 1-D tensor of strings if