Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge

tfr.data.build_ranking_dataset_with_parsing_fn

Builds a ranking tf.dataset using the provided parsing_fn.

file_pattern (str | list(str)) List of files or patterns of file paths containing serialized data. See tf.gfile.Glob for pattern rules.
parsing_fn (function) It has a single argument parsing_fn(serialized). Users can customize this for their own data formats.
batch_size (int) Number of records to combine in a single batch.
reader A function or class that can be called with a filenames tensor and (optional) reader_args and returns a Dataset. Defaults to tf.data.TFRecordDataset.
reader_args (list) Additional argument list to pass to the reader class.
num_epochs (int) Number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to None.
shuffle (bool) Indicates whether the input should be shuffled. Defaults to True.
shuffle_buffer_size (int) Buffer size of the ShuffleDataset. A large capacity ensures better shuffling but would increase memory usage and startup time.
shuffle_seed (int) Randomization seed to use for shuffling.
prefetch_buffer_size (int) Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step. Defaults to auto-tune.
reader_num_threads (int) Number of threads used to read records. If greater than 1, the results will be interleaved. Defaults to auto-tune.
sloppy_ordering (bool) If True, reading performance will be improved at the cost of non-deterministic ordering. If False, the order of elements produced is deterministic prior to shuffling (elements are still randomized if shuffle=True. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to False.
drop_final_batch (bool) If True, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to False. If True, the batch_size can be statically inferred.
num_parser_threads (int) Optional number of threads to be used with dataset.map() when invoking parsing_fn. Defaults to auto-tune.

A dataset of dict elements. Each dict maps feature keys to Tensor or SparseTensor objects.