Module google/Wiki-words-250-with-normalization/1

Token based text embedding trained on English Wikipedia corpus[1].

Module URL:


Text embedding based on skipgram version of word2vec with 1 out-of-vocabulary bucket. Maps from text to 250-dimensional embedding vectors.

Example use

embed = hub.Module("")
embeddings = embed(["cat is on the mat", "dog is in the fog"])


Skipgram model, hierarchical softmax, sub-sampling 1e-5.


The module takes a batch of sentences in a 1-D tensor of strings as input.


The module preprocesses its input by removing punctuation and splitting on spaces.

Out of vocabulary tokens

Module maps all out-of-vocabulary tokens into one bucket that is initialized with zeros.

Sentence embeddings

Word embeddings are combined into sentence embedding using the sqrtn combiner (see tf.nn.embedding_lookup_sparse).


[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.