Creates a _SparseColumn with vocabulary file configuration.
tf.contrib.layers.sparse_column_with_vocabulary_file(
column_name, vocabulary_file, num_oov_buckets=0, vocab_size=None,
default_value=-1, combiner='sum', dtype=tf.dtypes.string
)
Use this when your sparse features are in string or integer format, and you
have a vocab file that maps each value to an integer ID.
output_id = LookupIdFromVocab(input_feature_string)
Args |
column_name
|
A string defining sparse column name.
|
vocabulary_file
|
The vocabulary filename.
|
num_oov_buckets
|
The number of out-of-vocabulary buckets. If zero all out of
vocabulary features will be ignored.
|
vocab_size
|
Number of the elements in the vocabulary.
|
default_value
|
The value to use for out-of-vocabulary feature values.
Defaults to -1.
|
combiner
|
A string specifying how to reduce if the sparse column is
multivalent. Currently "mean", "sqrtn" and "sum" are supported, with "sum"
the default. "sqrtn" often achieves good accuracy, in particular with
bag-of-words columns.
- "sum": do not normalize features in the column
- "mean": do l1 normalization on features in the column
- "sqrtn": do l2 normalization on features in the column
For more information:
tf.embedding_lookup_sparse .
|
dtype
|
The type of features. Only string and integer types are supported.
|
Returns |
A _SparseColumn with vocabulary file configuration.
|
Raises |
ValueError
|
vocab_size is not defined.
|
ValueError
|
dtype is neither string nor integer.
|