FixedUnigramCandidateSampler

public final class FixedUnigramCandidateSampler

Generates labels for candidate sampling with a learned unigram distribution.

A unigram sampler could use a fixed unigram distribution read from a file or passed in as an in-memory array instead of building up the distribution from data on the fly. There is also an option to skew the distribution by applying a distortion power to the weights.

The vocabulary file should be in CSV-like format, with the last field being the weight associated with the word.

For each batch, this op picks a single set of sampled candidate labels.

The advantages of sampling candidates per-batch are simplicity and the possibility of efficient dense matrix multiplication. The disadvantage is that the sampled candidates must be chosen independently of the context and of the true labels.

Nested Classes

class FixedUnigramCandidateSampler.Options Optional attributes for FixedUnigramCandidateSampler

Constants

String OP_NAME The name of this op, as known by TensorFlow core engine

Public Methods

static FixedUnigramCandidateSampler	create(Scope scope, Operand<TInt64> trueClasses, Long numTrue, Long numSampled, Boolean unique, Long rangeMax, Options... options) Factory method to create a class wrapping a new FixedUnigramCandidateSampler operation.
static FixedUnigramCandidateSampler.Options	distortion(Float distortion)
static FixedUnigramCandidateSampler.Options	numReservedIds(Long numReservedIds)
static FixedUnigramCandidateSampler.Options	numShards(Long numShards)
Output<TInt64>	sampledCandidates() A vector of length num_sampled, in which each element is the ID of a sampled candidate.
Output<TFloat32>	sampledExpectedCount() A vector of length num_sampled, for each sampled candidate representing the number of times the candidate is expected to occur in a batch of sampled candidates.
static FixedUnigramCandidateSampler.Options	seed(Long seed)
static FixedUnigramCandidateSampler.Options	seed2(Long seed2)
static FixedUnigramCandidateSampler.Options	shard(Long shard)
Output<TFloat32>	trueExpectedCount() A batch_size * num_true matrix, representing the number of times each candidate is expected to occur in a batch of sampled candidates.
static FixedUnigramCandidateSampler.Options	unigrams(List<Float> unigrams)
static FixedUnigramCandidateSampler.Options	vocabFile(String vocabFile)

Inherited Methods

From class org.tensorflow.op.RawOp

final boolean	equals(Object obj)
final int	hashCode()
Operation	op() Return this unit of computation as a single `Operation`.
final String	toString()

From class java.lang.Object

boolean	equals(Object arg0)
final Class<?>	getClass()
int	hashCode()
final void	notify()
final void	notifyAll()
String	toString()
final void	wait(long arg0, int arg1)
final void	wait(long arg0)
final void	wait()

From interface org.tensorflow.op.Op

abstract ExecutionEnvironment	env() Return the execution environment this op was created in.
abstract Operation	op() Return this unit of computation as a single `Operation`.

Constants

public static final String OP_NAME

The name of this op, as known by TensorFlow core engine

Constant Value: "FixedUnigramCandidateSampler"

Public Methods

public static FixedUnigramCandidateSampler create (Scope scope, Operand<TInt64> trueClasses, Long numTrue, Long numSampled, Boolean unique, Long rangeMax, Options... options)

Factory method to create a class wrapping a new FixedUnigramCandidateSampler operation.

Parameters

scope	current scope
trueClasses	A batch_size * num_true matrix, in which each row contains the IDs of the num_true target_classes in the corresponding original label.
numTrue	Number of true labels per context.
numSampled	Number of candidates to randomly sample.
unique	If unique is true, we sample with rejection, so that all sampled candidates in a batch are unique. This requires some approximation to estimate the post-rejection sampling probabilities.
rangeMax	The sampler will sample integers from the interval [0, range_max).
options	carries optional attributes values

Returns

a new instance of FixedUnigramCandidateSampler

public static FixedUnigramCandidateSampler.Options distortion (Float distortion)

Parameters

distortion	The distortion is used to skew the unigram probability distribution. Each weight is first raised to the distortion's power before adding to the internal unigram distribution. As a result, distortion = 1.0 gives regular unigram sampling (as defined by the vocab file), and distortion = 0.0 gives a uniform distribution.

public static FixedUnigramCandidateSampler.Options numReservedIds (Long numReservedIds)

Parameters

numReservedIds	Optionally some reserved IDs can be added in the range [0, ..., num_reserved_ids) by the users. One use case is that a special unknown word token is used as ID 0. These IDs will have a sampling probability of 0.

public static FixedUnigramCandidateSampler.Options numShards (Long numShards)

Parameters

numShards	A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism. This parameter (together with 'shard') indicates the number of partitions that are being used in the overall computation.

public Output<TInt64> sampledCandidates ()

A vector of length num_sampled, in which each element is the ID of a sampled candidate.

public Output<TFloat32> sampledExpectedCount ()

A vector of length num_sampled, for each sampled candidate representing the number of times the candidate is expected to occur in a batch of sampled candidates. If unique=true, then this is a probability.

public static FixedUnigramCandidateSampler.Options seed (Long seed)

Parameters

seed	If either seed or seed2 are set to be non-zero, the random number generator is seeded by the given seed. Otherwise, it is seeded by a random seed.

public static FixedUnigramCandidateSampler.Options seed2 (Long seed2)

Parameters

seed2	An second seed to avoid seed collision.

public static FixedUnigramCandidateSampler.Options shard (Long shard)

Parameters

shard	A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism. This parameter (together with 'num_shards') indicates the particular partition number of a sampler op, when partitioning is being used.

public Output<TFloat32> trueExpectedCount ()

A batch_size * num_true matrix, representing the number of times each candidate is expected to occur in a batch of sampled candidates. If unique=true, then this is a probability.

public static FixedUnigramCandidateSampler.Options unigrams (List<Float> unigrams)

Parameters

unigrams	A list of unigram counts or probabilities, one per ID in sequential order. Exactly one of vocab_file and unigrams should be passed to this op.

public static FixedUnigramCandidateSampler.Options vocabFile (String vocabFile)

Parameters

vocabFile	Each valid line in this file (which should have a CSV-like format) corresponds to a valid word ID. IDs are in sequential order, starting from num_reserved_ids. The last entry in each line is expected to be a value corresponding to the count or relative probability. Exactly one of vocab_file and unigrams needs to be passed to this op.

FixedUnigramCandidateSampler Stay organized with collections Save and categorize content based on your preferences.

Nested Classes

Constants

Public Methods

Inherited Methods

Constants

public static final String OP_NAME

Public Methods

public static FixedUnigramCandidateSampler create (Scope scope, Operand<TInt64> trueClasses, Long numTrue, Long numSampled, Boolean unique, Long rangeMax, Options... options)

Parameters

Returns

public static FixedUnigramCandidateSampler.Options distortion (Float distortion)

Parameters

public static FixedUnigramCandidateSampler.Options numReservedIds (Long numReservedIds)

Parameters

public static FixedUnigramCandidateSampler.Options numShards (Long numShards)

Parameters

public Output<TInt64> sampledCandidates ()

public Output<TFloat32> sampledExpectedCount ()

public static FixedUnigramCandidateSampler.Options seed (Long seed)

Parameters

public static FixedUnigramCandidateSampler.Options seed2 (Long seed2)

Parameters

public static FixedUnigramCandidateSampler.Options shard (Long shard)

Parameters

public Output<TFloat32> trueExpectedCount ()

public static FixedUnigramCandidateSampler.Options unigrams (List<Float> unigrams)

Parameters

public static FixedUnigramCandidateSampler.Options vocabFile (String vocabFile)

Parameters

FixedUnigramCandidateSampler