Given a path to new and old vocabulary files, returns a remapping Tensor of
tf.raw_ops.GenerateVocabRemapping(
new_vocab_file, old_vocab_file, new_vocab_offset, num_new_vocab,
old_vocab_size=-1, name=None
)
length num_new_vocab
, where remapping[i]
contains the row number in the old
vocabulary that corresponds to row i
in the new vocabulary (starting at line
new_vocab_offset
and up to num_new_vocab
entities), or -1
if entry i
in the new vocabulary is not in the old vocabulary. The old vocabulary is
constrained to the first old_vocab_size
entries if old_vocab_size
is not the
default value of -1.
num_vocab_offset
enables
use in the partitioned variable case, and should generally be set through
examining partitioning info. The format of the files should be a text file,
with each line containing a single entity within the vocabulary.
For example, with new_vocab_file
a text file containing each of the following
elements on a single line: [f0, f1, f2, f3]
, old_vocab_file = [f1, f0, f3],
num_new_vocab = 3, new_vocab_offset = 1
, the returned remapping would be
[0, -1, 2]
.
The op also returns a count of how many entries in the new vocabulary were present in the old vocabulary, which is used to calculate the number of values to initialize in a weight matrix remapping
This functionality can be used to remap both row vocabularies (typically, features) and column vocabularies (typically, classes) from TensorFlow checkpoints. Note that the partitioning logic relies on contiguous vocabularies corresponding to div-partitioned variables. Moreover, the underlying remapping uses an IndexTable (as opposed to an inexact CuckooTable), so client code should use the corresponding index_table_from_file() as the FeatureColumn framework does (as opposed to tf.feature_to_id(), which uses a CuckooTable).
Args | |
---|---|
new_vocab_file
|
A Tensor of type string . Path to the new vocab file.
|
old_vocab_file
|
A Tensor of type string . Path to the old vocab file.
|
new_vocab_offset
|
An int that is >= 0 .
How many entries into the new vocab file to start reading.
|
num_new_vocab
|
An int that is >= 0 .
Number of entries in the new vocab file to remap.
|
old_vocab_size
|
An optional int that is >= -1 . Defaults to -1 .
Number of entries in the old vocab file to consider. If -1,
use the entire old vocabulary.
|
name
|
A name for the operation (optional). |
Returns | |
---|---|
A tuple of Tensor objects (remapping, num_present).
|
|
remapping
|
A Tensor of type int64 .
|
num_present
|
A Tensor of type int32 .
|