Token based text embedding initialized randomly.
Module URL: https://tfhub.dev/google/random-nnlm-en-dim128/1
Text embedding initialized with
tf.random_normal([vocabulary_size, 128]). It
contains no "knowledge", but can conveniently be used as a baseline when
comparing to other modules.
embed = hub.Module("https://tfhub.dev/google/random-nnlm-en-dim128/1") embeddings = embed(["cat is on the mat", "dog is in the fog"])
Vocabulary of the module is based on nnlm-en-dim128.
The module takes a batch of sentences in a 1-D tensor of strings as input.
The module preprocesses its input by splitting on spaces.
Out of vocabulary tokens
Small fraction of the least frequent tokens from the original vocabulary (~2.5%) are replaced by hash buckets, initialized also randomly and from the same distribution.