TensorFlow is back at Google I/O on May 14! Register now

tflite_model_maker.recommendation.DataLoader

Recommendation data loader.

tflite_model_maker.recommendation.DataLoader(
    dataset, size, vocab
)

Args
`dataset`	tf.data.Dataset for recommendation.
`size`	int, dataset size.
`vocab`	list of dict, each vocab item is described above.

Attributes
`size`	Returns the size of the dataset. Note that this function may return None becuase the exact size of the dataset isn't a necessary parameter to create an instance of this class, and tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and this function can return an int representing the size of the dataset.

Attributes

size

Returns the size of the dataset.

Note that this function may return None becuase the exact size of the dataset isn't a necessary parameter to create an instance of this class, and tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and this function can return an int representing the size of the dataset.

Methods

`download_and_extract_movielens`

View source

@classmethod
download_and_extract_movielens(
    download_dir
)

Downloads and extracts movielens dataset, then returns extracted dir.

`from_movielens`

View source

@classmethod
from_movielens(
    data_dir,
    data_tag,
    input_spec: tflite_model_maker.recommendation.spec.InputSpec,
    generated_examples_dir=None,
    min_timeline_length=3,
    max_context_length=10,
    max_context_movie_genre_length=10,
    min_rating=None,
    train_data_fraction=0.9,
    build_vocabs=True,
    train_filename='train_movielens_1m.tfrecord',
    test_filename='test_movielens_1m.tfrecord',
    vocab_filename='movie_vocab.json',
    meta_filename='meta.json'
)

Generates data loader from movielens dataset.

The method downloads and prepares dataset, then generates for train/eval.

For movielens data format, see:

function _generate_fake_data in recommendation_testutil.py
Or, zip file: http://files.grouplens.org/datasets/movielens/ml-1m.zip

Args
`data_dir`	str, path to dataset containing (unzipped) text data.
`data_tag`	str, specify dataset in {'train', 'test'}.
`input_spec`	InputSpec, specify data format for input and embedding.
`generated_examples_dir`	str, path to generate preprocessed examples. (default: same as data_dir)
`min_timeline_length`	int, min timeline length to split train/eval set.
`max_context_length`	int, max context length as one input.
`max_context_movie_genre_length`	int, max context length of movie genre as one input.
`min_rating`	int or None, include examples with min rating.
`train_data_fraction`	float, percentage of training data [0.0, 1.0].
`build_vocabs`	boolean, whether to build vocabs.
`train_filename`	str, generated file name for training data.
`test_filename`	str, generated file name for test data.
`vocab_filename`	str, generated file name for vocab data.
`meta_filename`	str, generated file name for meta data.

Returns
Data Loader.

`gen_dataset`

View source

gen_dataset(
    batch_size=1,
    is_training=False,
    shuffle=False,
    input_pipeline_context=None,
    preprocess=None,
    drop_remainder=True,
    total_steps=None
)

Generates dataset, and overwrites default drop_remainder = True.

`generate_movielens_dataset`

View source

@classmethod
generate_movielens_dataset(
    data_dir,
    generated_examples_dir=None,
    train_filename='train_movielens_1m.tfrecord',
    test_filename='test_movielens_1m.tfrecord',
    vocab_filename='movie_vocab.json',
    meta_filename='meta.json',
    min_timeline_length=3,
    max_context_length=10,
    max_context_movie_genre_length=10,
    min_rating=None,
    train_data_fraction=0.9,
    build_vocabs=True
)

Generate movielens dataset, and returns a dict contains meta.

Args
`data_dir`	str, path to dataset containing (unzipped) text data.
`generated_examples_dir`	str, path to generate preprocessed examples. (default: same as data_dir)
`train_filename`	str, generated file name for training data.
`test_filename`	str, generated file name for test data.
`vocab_filename`	str, generated file name for vocab data.
`meta_filename`	str, generated file name for meta data.
`min_timeline_length`	int, min timeline length to split train/eval set.
`max_context_length`	int, max context length as one input.
`max_context_movie_genre_length`	int, max context length of movie genre as one input.
`min_rating`	int or None, include examples with min rating.
`train_data_fraction`	float, percentage of training data [0.0, 1.0].
`build_vocabs`	boolean, whether to build vocabs.

Returns
Dict, metadata for the movielens dataset. Containing keys: `train_file`, `train_size`, `test_file`, `test_size`, vocab_file`,`vocab_size`, etc.

`get_num_classes`

View source

@classmethod
get_num_classes(
    meta
) -> int

Gets number of classes.

0 is reserved. Number of classes is Max Id + 1, e.g., if Max Id = 100, then classes are [0, 100], that is 101 classes in total.

Args
`meta`	dict, containing meta['vocab_max_id'].

Returns
Number of classes.

`load_vocab`

View source

@classmethod
load_vocab(
    vocab_file
) -> collections.OrderedDict

Loads vocab from file.

The vocab file should be json format of: a list of list[size=4], where the 4 elements are ordered as: [id=int, title=str, genres=str joined with '|', count=int] It is generated when preparing movielens dataset.

Args
`vocab_file`	str, path to vocab file.

Returns
`vocab`	an OrderedDict maps id to item. Each item represents a movie { 'id': int, 'title': str, 'genres': list[str], 'count': int, }

`split`

View source

split(
    fraction
)

Splits dataset into two sub-datasets with the given fraction.

Primarily used for splitting the data set into training and testing sets.

Args
`fraction`	float, demonstrates the fraction of the first returned subdataset in the original data.

Returns
The splitted two sub datasets.

`len`

View source

__len__()

tflite_model_maker.recommendation.DataLoader

Args

Attributes

Methods

download_and_extract_movielens

from_movielens

gen_dataset

generate_movielens_dataset

get_num_classes

load_vocab

split

__len__

`download_and_extract_movielens`

`from_movielens`

`gen_dataset`

`generate_movielens_dataset`

`get_num_classes`

`load_vocab`

`split`

`len`