tf.keras.preprocessing.sequence.skipgrams

Generates skipgram word pairs.

Used in the notebooks

Used in the tutorials

This function transforms a sequence of word indexes (list of integers) into tuples of words of the form:

  • (word, word in the same window), with label 1 (positive samples).
  • (word, random word from the vocabulary), with label 0 (negative samples).

Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space

sequence A word sequence (sentence), encoded as a list of word indices (integers). If using a sampling_table, word indices are expected to match the rank of the words in a reference dataset (e.g. 10 would encode the 10-th most frequently occurring token). Note that index 0 is expected to be a non-word and will be skipped.
vocabulary_size Int, maximum possible word index + 1
window_size Int, size of sampling windows (technically half-window). The window of a word w_i will be [i - window_size, i + window_size+1].