Converting

TensorFlow provides several operations that you can use to convert various data formats into tensors.

tf.decode_csv(records, record_defaults, field_delim=None, name=None)

Convert CSV records to tensors. Each column maps to one tensor.

RFC 4180 format is expected for the CSV records. (https://tools.ietf.org/html/rfc4180) Note that we allow leading and trailing spaces with int or float field.

Args:
  • records: A Tensor of type string. Each string is a record/row in the csv and all records should have the same format.
  • record_defaults: A list of Tensor objects with types from: float32, int32, int64, string. One tensor per column of the input record, with either a scalar default value for that column or empty if the column is required.
  • field_delim: An optional string. Defaults to ",". delimiter to separate fields in a record.
  • name: A name for the operation (optional).
Returns:

A list of Tensor objects. Has the same type as record_defaults. Each tensor will have the same shape as records.


tf.decode_raw(bytes, out_type, little_endian=None, name=None)

Reinterpret the bytes of a string as a vector of numbers.

Args:
  • bytes: A Tensor of type string. All the elements must have the same length.
  • out_type: A tf.DType from: tf.half, tf.float32, tf.float64, tf.int32, tf.uint8, tf.int16, tf.int8, tf.int64.
  • little_endian: An optional bool. Defaults to True. Whether the input bytes are in little-endian order. Ignored for out_type values that are stored in a single byte like uint8.
  • name: A name for the operation (optional).
Returns:

A Tensor of type out_type. A Tensor with one more dimension than the input bytes. The added dimension will have size equal to the length of the elements of bytes divided by the number of bytes to represent out_type.


Example protocol buffer

TensorFlow's recommended format for training examples is serialized Example protocol buffers, described here. They contain Features, described here.


class tf.VarLenFeature

Configuration for parsing a variable-length input feature.

Fields: dtype: Data type of input.


tf.VarLenFeature.__getnewargs__() {:#VarLenFeature.getnewargs}

Return self as a plain tuple. Used by copy and pickle.


tf.VarLenFeature.__getstate__() {:#VarLenFeature.getstate}

Exclude the OrderedDict from pickling


tf.VarLenFeature.__new__(_cls, dtype) {:#VarLenFeature.new}

Create new instance of VarLenFeature(dtype,)


tf.VarLenFeature.__repr__() {:#VarLenFeature.repr}

Return a nicely formatted representation string


tf.VarLenFeature.dtype

Alias for field number 0


class tf.FixedLenFeature

Configuration for parsing a fixed-length input feature.

To treat sparse input as dense, provide a default_value; otherwise, the parse functions will fail on any examples missing this feature.

Fields: shape: Shape of input data. dtype: Data type of input. default_value: Value to be used if an example is missing this feature. It must be compatible with dtype.


tf.FixedLenFeature.__getnewargs__() {:#FixedLenFeature.getnewargs}

Return self as a plain tuple. Used by copy and pickle.


tf.FixedLenFeature.__getstate__() {:#FixedLenFeature.getstate}

Exclude the OrderedDict from pickling


tf.FixedLenFeature.__new__(_cls, shape, dtype, default_value=None) {:#FixedLenFeature.new}

Create new instance of FixedLenFeature(shape, dtype, default_value)


tf.FixedLenFeature.__repr__() {:#FixedLenFeature.repr}

Return a nicely formatted representation string


tf.FixedLenFeature.default_value

Alias for field number 2


tf.FixedLenFeature.dtype

Alias for field number 1


tf.FixedLenFeature.shape

Alias for field number 0


class tf.FixedLenSequenceFeature

Configuration for a dense input feature in a sequence item.

To treat a sparse input as dense, provide allow_missing=True; otherwise, the parse functions will fail on any examples missing this feature.

Fields: shape: Shape of input data. dtype: Data type of input. allow_missing: Whether to allow this feature to be missing from a feature list item.


tf.FixedLenSequenceFeature.__getnewargs__() {:#FixedLenSequenceFeature.getnewargs}

Return self as a plain tuple. Used by copy and pickle.


tf.FixedLenSequenceFeature.__getstate__() {:#FixedLenSequenceFeature.getstate}

Exclude the OrderedDict from pickling


tf.FixedLenSequenceFeature.__new__(_cls, shape, dtype, allow_missing=False) {:#FixedLenSequenceFeature.new}

Create new instance of FixedLenSequenceFeature(shape, dtype, allow_missing)


tf.FixedLenSequenceFeature.__repr__() {:#FixedLenSequenceFeature.repr}

Return a nicely formatted representation string


tf.FixedLenSequenceFeature.allow_missing

Alias for field number 2


tf.FixedLenSequenceFeature.dtype

Alias for field number 1


tf.FixedLenSequenceFeature.shape

Alias for field number 0


tf.parse_example(serialized, features, name=None, example_names=None)

Parses Example protos into a dict of tensors.

Parses a number of serialized Example protos given in serialized.

example_names may contain descriptive names for the corresponding serialized protos. These may be useful for debugging purposes, but they have no effect on the output. If not None, example_names must be the same length as serialized.

This op parses serialized examples into a dictionary mapping keys to Tensor and SparseTensor objects. features is a dict from keys to VarLenFeature and FixedLenFeature objects. Each VarLenFeature is mapped to a SparseTensor, and each FixedLenFeature is mapped to a Tensor.

Each VarLenFeature maps to a SparseTensor of the specified type representing a ragged matrix. Its indices are [batch, index] where batch is the batch entry the value is from in serialized, and index is the value's index in the list of values associated with that feature and example.

Each FixedLenFeature df maps to a Tensor of the specified type (or tf.float32 if not specified) and shape (serialized.size(),) + df.shape.

FixedLenFeature entries with a default_value are optional. With no default value, we will fail if that Feature is missing from any example in serialized.

Examples:

For example, if one expects a tf.float32 sparse feature ft and three serialized Examples are provided:

serialized = [
  features
    { feature { key: "ft" value { float_list { value: [1.0, 2.0] } } } },
  features
    { feature []},
  features
    { feature { key: "ft" value { float_list { value: [3.0] } } }
]

then the output will look like:

{"ft": SparseTensor(indices=[[0, 0], [0, 1], [2, 0]],
                    values=[1.0, 2.0, 3.0],
                    shape=(3, 2)) }

Given two Example input protos in serialized:

[
  features {
    feature { key: "kw" value { bytes_list { value: [ "knit", "big" ] } } }
    feature { key: "gps" value { float_list { value: [] } } }
  },
  features {
    feature { key: "kw" value { bytes_list { value: [ "emmy" ] } } }
    feature { key: "dank" value { int64_list { value: [ 42 ] } } }
    feature { key: "gps" value { } }
  }
]

And arguments

example_names: ["input0", "input1"],
features: {
    "kw": VarLenFeature(tf.string),
    "dank": VarLenFeature(tf.int64),
    "gps": VarLenFeature(tf.float32),
}

Then the output is a dictionary:

{
  "kw": SparseTensor(
      indices=[[0, 0], [0, 1], [1, 0]],
      values=["knit", "big", "emmy"]
      shape=[2, 2]),
  "dank": SparseTensor(
      indices=[[1, 0]],
      values=[42],
      shape=[2, 1]),
  "gps": SparseTensor(
      indices=[],
      values=[],
      shape=[2, 0]),
}

For dense results in two serialized Examples:

[
  features {
    feature { key: "age" value { int64_list { value: [ 0 ] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
   },
   features {
    feature { key: "age" value { int64_list { value: [] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
  }
]

We can use arguments:

example_names: ["input0", "input1"],
features: {
    "age": FixedLenFeature([], dtype=tf.int64, default_value=-1),
    "gender": FixedLenFeature([], dtype=tf.string),
}

And the expected output is:

{
  "age": [[0], [-1]],
  "gender": [["f"], ["f"]],
}
Args:
  • serialized: A vector (1-D Tensor) of strings, a batch of binary serialized Example protos.
  • features: A dict mapping feature keys to FixedLenFeature or VarLenFeature values.
  • name: A name for this operation (optional).
  • example_names: A vector (1-D Tensor) of strings (optional), the names of the serialized protos in the batch.
Returns:

A dict mapping feature keys to Tensor and SparseTensor values.

Raises:
  • ValueError: if any feature is invalid.

tf.parse_single_example(serialized, features, name=None, example_names=None)

Parses a single Example proto.

Similar to parse_example, except:

For dense tensors, the returned Tensor is identical to the output of parse_example, except there is no batch dimension, the output shape is the same as the shape given in dense_shape.

For SparseTensors, the first (batch) column of the indices matrix is removed (the indices matrix is a column vector), the values vector is unchanged, and the first (batch_size) entry of the shape vector is removed (it is now a single element vector).

Args:
  • serialized: A scalar string Tensor, a single serialized Example. See _parse_single_example_raw documentation for more details.
  • features: A dict mapping feature keys to FixedLenFeature or VarLenFeature values.
  • name: A name for this operation (optional).
  • example_names: (Optional) A scalar string Tensor, the associated name. See _parse_single_example_raw documentation for more details.
Returns:

A dict mapping feature keys to Tensor and SparseTensor values.

Raises:
  • ValueError: if any feature is invalid.

tf.parse_tensor(serialized, out_type, name=None)

Transforms a serialized tensorflow.TensorProto proto into a Tensor.

Args:
  • serialized: A Tensor of type string. A scalar string containing a serialized TensorProto proto.
  • out_type: A tf.DType. The type of the serialized tensor. The provided type must match the type of the serialized tensor and no implicit conversion will take place.
  • name: A name for the operation (optional).
Returns:

A Tensor of type out_type. A Tensor of type out_type.


tf.decode_json_example(json_examples, name=None)

Convert JSON-encoded Example records to binary protocol buffer strings.

This op translates a tensor containing Example records, encoded using the standard JSON mapping, into a tensor containing the same records encoded as binary protocol buffers. The resulting tensor can then be fed to any of the other Example-parsing ops.

Args:
  • json_examples: A Tensor of type string. Each string is a JSON object serialized according to the JSON mapping of the Example proto.
  • name: A name for the operation (optional).
Returns:

A Tensor of type string. Each string is a binary Example protocol buffer corresponding to the respective element of json_examples.