Given a path to new and old vocabulary files, returns a remapping Tensor of
Compat aliases for migration
See Migration guide for more details.
tf.raw_ops.GenerateVocabRemapping( new_vocab_file, old_vocab_file, new_vocab_offset, num_new_vocab, old_vocab_size=-1, name=None )
remapping[i] contains the row number in the old
vocabulary that corresponds to row
i in the new vocabulary (starting at line
new_vocab_offset and up to
num_new_vocab entities), or
-1 if entry
in the new vocabulary is not in the old vocabulary. The old vocabulary is
constrained to the first
old_vocab_size entries if
old_vocab_size is not the
default value of -1.
use in the partitioned variable case, and should generally be set through
examining partitioning info. The format of the files should be a text file,
with each line containing a single entity within the vocabulary.
For example, with
new_vocab_file a text file containing each of the following
elements on a single line:
[f0, f1, f2, f3], old_vocab_file = [f1, f0, f3],
num_new_vocab = 3, new_vocab_offset = 1, the returned remapping would be
[0, -1, 2].
The op also returns a count of how many entries in the new vocabulary were present in the old vocabulary, which is used to calculate the number of values to initialize in a weight matrix remapping
This functionality can be used to remap both row vocabularies (typically, features) and column vocabularies (typically, classes) from TensorFlow checkpoints. Note that the partitioning logic relies on contiguous vocabularies corresponding to div-partitioned variables. Moreover, the underlying remapping uses an IndexTable (as opposed to an inexact CuckooTable), so client code should use the corresponding index_table_from_file() as the FeatureColumn framework does (as opposed to tf.feature_to_id(), which uses a CuckooTable).
string. Path to the new vocab file.
string. Path to the old vocab file.
>= 0. How many entries into the new vocab file to start reading.
>= 0. Number of entries in the new vocab file to remap.
old_vocab_size: An optional
>= -1. Defaults to
-1. Number of entries in the old vocab file to consider. If -1, use the entire old vocabulary.
name: A name for the operation (optional).
A tuple of
Tensor objects (remapping, num_present).