Skipgram

public final class Skipgram

Parses a text file and creates a batch of examples.

Nested Classes

class Skipgram.Options Optional attributes for Skipgram

Public Methods

static Skipgram
create ( Scope scope, String filename, Long batchSize, Options... options)
Factory method to create a class wrapping a new Skipgram operation.
Output <Integer>
currentEpoch ()
The current epoch number.
Output <Integer>
examples ()
A vector of word ids.
Output <Integer>
labels ()
A vector of word ids.
static Skipgram.Options
minCount (Long minCount)
static Skipgram.Options
subsample (Float subsample)
Output <Long>
totalWordsProcessed ()
The total number of words processed so far.
Output <Integer>
vocabFreq ()
Frequencies of words.
Output <String>
vocabWord ()
A vector of words in the corpus.
static Skipgram.Options
windowSize (Long windowSize)
Output <Long>
wordsPerEpoch ()
Number of words per epoch in the data file.

Inherited Methods

Public Methods

public static Skipgram create ( Scope scope, String filename, Long batchSize, Options... options)

Factory method to create a class wrapping a new Skipgram operation.

Parameters
scope current scope
filename The corpus's text file name.
batchSize The size of produced batch.
options carries optional attributes values
Returns
  • a new instance of Skipgram

public Output <Integer> currentEpoch ()

The current epoch number.

public Output <Integer> examples ()

A vector of word ids.

public Output <Integer> labels ()

A vector of word ids.

public static Skipgram.Options minCount (Long minCount)

Parameters
minCount The minimum number of word occurrences for it to be included in the vocabulary.

public static Skipgram.Options subsample (Float subsample)

Parameters
subsample Threshold for word occurrence. Words that appear with higher frequency will be randomly down-sampled. Set to 0 to disable.

public Output <Long> totalWordsProcessed ()

The total number of words processed so far.

public Output <Integer> vocabFreq ()

Frequencies of words. Sorted in the non-ascending order.

public Output <String> vocabWord ()

A vector of words in the corpus.

public static Skipgram.Options windowSize (Long windowSize)

Parameters
windowSize The number of words to predict to the left and right of the target.

public Output <Long> wordsPerEpoch ()

Number of words per epoch in the data file.