Token based text embedding trained on English Google News 7B corpus.
Module URL: https://tfhub.dev/google/nnlm-en-dim50/1
Text embedding based on feed-forward Neural-Net Language Models (NN-LM) with pre-built OOV. Maps from text to 50-dimensional embedding vectors.
embed = hub.Module("https://tfhub.dev/google/nnlm-en-dim50/1") embeddings = embed(["cat is on the mat", "dog is in the fog"])
Based on NNLM with two hidden layers.
The module takes a batch of sentences in a 1-D tensor of strings as input.
The module preprocesses its input by splitting on spaces.
Out of vocabulary tokens
Small fraction of the least frequent tokens and embeddings (~2.5%) are replaced by hash buckets. Each has bucket is initialized using the remaining embedding vectors that hash to the same bucket.