Token based text embedding trained on English Wikipedia corpus.
Text embedding based on skipgram version of word2vec with 1 out-of-vocabulary bucket. Maps from text to 250-dimensional embedding vectors.
embed = hub.Module("https://tfhub.dev/google/Wiki-words-250-with-normalization/1") embeddings = embed(["cat is on the mat", "dog is in the fog"])
Skipgram model, hierarchical softmax, sub-sampling 1e-5.
The module takes a batch of sentences in a 1-D tensor of strings as input.
The module preprocesses its input by removing punctuation and splitting on spaces.
Out of vocabulary tokens
Module maps all out-of-vocabulary tokens into one bucket that is initialized with zeros.
Word embeddings are combined into sentence embedding using the
 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.