GenerateVocabRemapping

public final class GenerateVocabRemapping

Given a path to new and old vocabulary files, returns a remapping Tensor of

length `num_new_vocab`, where `remapping[i]` contains the row number in the old vocabulary that corresponds to row `i` in the new vocabulary (starting at line `new_vocab_offset` and up to `num_new_vocab` entities), or `-1` if entry `i` in the new vocabulary is not in the old vocabulary. The old vocabulary is constrained to the first `old_vocab_size` entries if `old_vocab_size` is not the default value of -1.

`num_vocab_offset` enables use in the partitioned variable case, and should generally be set through examining partitioning info. The format of the files should be a text file, with each line containing a single entity within the vocabulary.

For example, with `new_vocab_file` a text file containing each of the following elements on a single line: `[f0, f1, f2, f3]`, old_vocab_file = [f1, f0, f3], `num_new_vocab = 3, new_vocab_offset = 1`, the returned remapping would be `[0, -1, 2]`.

The op also returns a count of how many entries in the new vocabulary were present in the old vocabulary, which is used to calculate the number of values to initialize in a weight matrix remapping

This functionality can be used to remap both row vocabularies (typically, features) and column vocabularies (typically, classes) from TensorFlow checkpoints. Note that the partitioning logic relies on contiguous vocabularies corresponding to div-partitioned variables. Moreover, the underlying remapping uses an IndexTable (as opposed to an inexact CuckooTable), so client code should use the corresponding index_table_from_file() as the FeatureColumn framework does (as opposed to tf.feature_to_id(), which uses a CuckooTable).

Nested Classes

class GenerateVocabRemapping.Options Optional attributes for GenerateVocabRemapping

Constants

String OP_NAME The name of this op, as known by TensorFlow core engine

Public Methods

static GenerateVocabRemapping
create ( Scope scope, Operand < TString > newVocabFile, Operand < TString > oldVocabFile, Long newVocabOffset, Long numNewVocab, Options... options)
Factory method to create a class wrapping a new GenerateVocabRemapping operation.
Output < TInt32 >
numPresent ()
Number of new vocab entries found in old vocab.
static GenerateVocabRemapping.Options
oldVocabSize (Long oldVocabSize)
Output < TInt64 >
remapping ()
A Tensor of length num_new_vocab where the element at index i is equal to the old ID that maps to the new ID i.

Inherited Methods

Constants

public static final String OP_NAME

The name of this op, as known by TensorFlow core engine

Constant Value: "GenerateVocabRemapping"

Public Methods

public static GenerateVocabRemapping create ( Scope scope, Operand < TString > newVocabFile, Operand < TString > oldVocabFile, Long newVocabOffset, Long numNewVocab, Options... options)

Factory method to create a class wrapping a new GenerateVocabRemapping operation.

Parameters
scope current scope
newVocabFile Path to the new vocab file.
oldVocabFile Path to the old vocab file.
newVocabOffset How many entries into the new vocab file to start reading.
numNewVocab Number of entries in the new vocab file to remap.
options carries optional attributes values
Returns
  • a new instance of GenerateVocabRemapping

public Output < TInt32 > numPresent ()

Number of new vocab entries found in old vocab.

public static GenerateVocabRemapping.Options oldVocabSize (Long oldVocabSize)

Parameters
oldVocabSize Number of entries in the old vocab file to consider. If -1, use the entire old vocabulary.

public Output < TInt64 > remapping ()

A Tensor of length num_new_vocab where the element at index i is equal to the old ID that maps to the new ID i. This element is -1 for any new ID that is not found in the old vocabulary.