Have a question? Connect with the community at the TensorFlow Forum Visit Forum

tflite_model_maker.recommendation.DataLoader

Recommendation data loader.

dataset tf.data.Dataset for recommendation.
size int, dataset size.
vocab list of dict, each vocab item is described above.

Methods

download_and_extract_movielens

Downloads and extracts movielens dataset, then returns extracted dir.

from_movielens

Generates data loader from movielens dataset.

The method downloads and prepares dataset, then generates for train/eval.

For movielens data format, see:

Args
data_dir str, path to dataset containing (unzipped) text data.
data_tag str, specify dataset in {'train', 'test'}.
generated_examples_dir str, path to generate preprocessed examples. (default: same as data_dir)
min_timeline_length int, min timeline length to split train/eval set.
max_context_length int, max context length as the input.
train_filename str, generated file name for training data.
test_filename str, generated file name for test data.
vocab_filename str, generated file name for vocab data.
meta_filename str, generated file name for meta data.

Returns
Data Loader.

gen_dataset

Generates dataset, and overwrites default drop_remainder = True.

load_vocab

Loads vocab from file.

The vocab file should be json format of: a list of list[size=4], where the 4 elements are ordered as: [id=int, title=str, genres=str joined with '|', count=int] It is generated when preparing movielens dataset.

Args
vocab_file str, path to vocab file.

Returns
vocab an OrderedDict maps id to item. Each item represents a movie { 'id': int, 'title': str, 'genres': list[str], 'count': int, }

read_as_dataset

Reads file pattern as dataset.

split

Splits dataset into two sub-datasets with the given fraction.

Primarily used for splitting the data set into training and testing sets.

Args
fraction float, demonstrates the fraction of the first returned subdataset in the original data.

Returns
The splitted two sub datasets.

__len__