View source on GitHub

Creates a _SparseColumn with vocabulary file configuration.

Use this when your sparse features are in string or integer format, and you have a vocab file that maps each value to an integer ID. output_id = LookupIdFromVocab(input_feature_string)

column_name A string defining sparse column name.
vocabulary_file The vocabulary filename.
num_oov_buckets The number of out-of-vocabulary buckets. If zero all out of vocabulary features will be ignored.
vocab_size Number of the elements in the vocabulary.
default_value The value to use for out-of-vocabulary feature values. Defaults to -1.
combiner A string specifying how to reduce if the sparse column is multivalent. Currently "mean", "sqrtn" and "sum" are supported, with "sum" the default. "sqrtn" often achieves good accuracy, in particular with bag-of-words columns.

  • "sum": do not normalize features in the column
  • "mean": do l1 normalization on features in the column
  • "sqrtn": do l2 normalization on features in the column For more information: tf.embedding_lookup_sparse.
dtype The type of features. Only string and integer types are supported.

A _SparseColumn with vocabulary file configuration.

ValueError vocab_size is not defined.
ValueError dtype is neither string nor integer.