Module google/Wiki-words-250/1

Token based text embedding trained on English Wikipedia corpus[1].

Module URL: https://tfhub.dev/google/Wiki-words-250/1

Overview

Text embedding based on skipgram version of word2vec with 1 out-of-vocabulary bucket. Maps from text to 250-dimensional embedding vectors.

Example use

embed = hub.Module("https://tfhub.dev/google/Wiki-words-250/1")
embeddings = embed(["cat is on the mat", "dog is in the fog"])

Details

Skipgram model, hierarchical softmax, sub-sampling 1e-5.

Input

The module takes a batch of sentences in a 1-D tensor of strings as input.

Preprocessing

The module preprocesses its input by splitting on spaces.

Out of vocabulary tokens

Module maps all out-of-vocabulary tokens into one bucket that is initialized with zeros.

References

[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.