Module google/‌nnlm-en-dim50/1

Token based text embedding trained on English Google News 7B corpus.

Module URL:


Text embedding based on feed-forward Neural-Net Language Models[1] with pre-built OOV. Maps from text to 50-dimensional embedding vectors.

Example use

embed = hub.Module("")
embeddings = embed(["cat is on the mat", "dog is in the fog"])


Based on NNLM with two hidden layers.


The module takes a batch of sentences in a 1-D tensor of strings as input.


The module preprocesses its input by splitting on spaces.

Out of vocabulary tokens

Small fraction of the least frequent tokens and embeddings (~2.5%) are replaced by hash buckets. Each hash bucket is initialized using the remaining embedding vectors that hash to the same bucket.

Sentence embeddings

Word embeddings are combined into sentence embedding using the sqrtn combiner (see tf.nn.embedding_lookup_sparse).


[1] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3:1137-1155, 2003.