Better performance with tf.function

View on Run in Google Colab View source on GitHub Download notebook

In TensorFlow 2, eager execution is turned on by default. The user interface is intuitive and flexible (running one-off operations is much easier and faster), but this can come at the expense of performance and deployability.

You can use tf.function to make graphs out of your programs. It is a transformation tool that creates Python-independent dataflow graphs out of your Python code. This will help you create performant and portable models, and it is required to use SavedModel.

This guide will help you conceptualize how tf.function works under the hood so you can use it effectively.

The main takeaways and recommendations are:

  • Debug in eager mode, then decorate with @tf.function.
  • Don't rely on Python side effects like object mutation or list appends.
  • tf.function works best with TensorFlow ops; NumPy and Python calls are converted to constants.


import tensorflow as tf

Define a helper function to demonstrate the kinds of errors you might encounter:

import traceback
import contextlib

# Some helper code to demonstrate the kinds of errors you might encounter.
def assert_raises(error_class):
  except error_class as e:
    print('Caught expected exception \n  {}:'.format(error_class))
  except Exception as e:
    raise e
    raise Exception('Expected {} to be raised but no error was raised!'.format(



A Function you define is just like a core TensorFlow operation: You can execute it eagerly; you can compute gradients; and so on.

def add(a, b):
  return a + b

add(tf.ones([2, 2]), tf.ones([2, 2]))  #  [[2., 2.], [2., 2.]]
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 2.],
       [2., 2.]], dtype=float32)>
v = tf.Variable(1.0)
with tf.GradientTape() as tape:
  result = add(v, 1.0)
tape.gradient(result, v)
<tf.Tensor: shape=(), dtype=float32, numpy=1.0>

You can use Functions inside other Functions.

def dense_layer(x, w, b):
  return add(tf.matmul(x, w), b)

dense_layer(tf.ones([3, 2]), tf.ones([2, 2]), tf.ones([2]))
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[3., 3.],
       [3., 3.],
       [3., 3.]], dtype=float32)>

Functions can be faster than eager code, especially for graphs with many small ops. But for graphs with a few expensive ops (like convolutions), you may not see much speedup.

import timeit
conv_layer = tf.keras.layers.Conv2D(100, 3)

def conv_fn(image):
  return conv_layer(image)

image = tf.zeros([1, 200, 200, 100])
# warm up
conv_layer(image); conv_fn(image)
print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))
print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))
print("Note how there's not much difference in performance for convolutions")

Eager conv: 0.0023194860004878137
Function conv: 0.0036776439992536325
Note how there's not much difference in performance for convolutions


Python's dynamic typing means that you can call functions with a variety of argument types, and Python can do something different in each scenario.

Yet, to create a TensorFlow Graph, static dtypes and shape dimensions are required. tf.function bridges this gap by wrapping a Python function to create a Function object. Based on the given inputs, the Function selects the appropriate graph for the given inputs, retracing the Python function as necessary. Once you understand why and when tracing happens, it's much easier to use tf.function effectively!

You can call a Function with arguments of different types to see this polymorphic behavior in action.

def double(a):
  print("Tracing with", a)
  return a + a


Tracing with Tensor("a:0", shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)

Tracing with Tensor("a:0", shape=(), dtype=float32)
tf.Tensor(2.2, shape=(), dtype=float32)

Tracing with Tensor("a:0", shape=(), dtype=string)
tf.Tensor(b'aa', shape=(), dtype=string)

Note that if you repeatedly call a Function with the same argument type, TensorFlow will reuse a previously traced graph, as the generated graph would be identical.

# This doesn't print 'Tracing with ...'
tf.Tensor(b'bb', shape=(), dtype=string)

(The following change is available in TensorFlow nightly, and will be available in TensorFlow 2.3.)

You can use pretty_printed_concrete_signatures() to see all of the available traces:

    a: string Tensor, shape=()
    string Tensor, shape=()

    a: int32 Tensor, shape=()
    int32 Tensor, shape=()

    a: float32 Tensor, shape=()
    float32 Tensor, shape=()

So far, you've seen that tf.function creates a cached, dynamic dispatch layer over TensorFlow's graph tracing logic. To be more specific about the terminology:

  • A tf.Graph is the raw, language-agnostic, portable representation of your computation.
  • A ConcreteFunction is an eagerly-executing wrapper around a tf.Graph.
  • A Function manages a cache of ConcreteFunctions and picks the right one for your inputs.
  • tf.function wraps a Python function, returning a Function object.

Obtaining concrete functions

Every time a function is traced, a new concrete function is created. You can directly obtain a concrete function, by using get_concrete_function.

print("Obtaining concrete trace")
double_strings = double.get_concrete_function(tf.constant("a"))
print("Executing traced function")

Obtaining concrete trace
Executing traced function
tf.Tensor(b'aa', shape=(), dtype=string)
tf.Tensor(b'bb', shape=(), dtype=string)

# You can also call get_concrete_function on an InputSpec
double_strings_from_inputspec = double.get_concrete_function(tf.TensorSpec(shape=[], dtype=tf.string))
Tracing with Tensor("a:0", shape=(), dtype=string)
tf.Tensor(b'cc', shape=(), dtype=string)

(The following change is available in TensorFlow nightly, and will be available in TensorFlow 2.3.)

Printing a ConcreteFunction displays a summary of its input arguments (with types) and its output type.

ConcreteFunction double(a)
    a: string Tensor, shape=()
    string Tensor, shape=()

You can also directly retrieve a concrete function's signature.

((TensorSpec(shape=(), dtype=tf.string, name='a'),), {})
Tensor("Identity:0", shape=(), dtype=string)

Using a concrete trace with incompatible types will throw an error

with assert_raises(tf.errors.InvalidArgumentError):
Caught expected exception 
  <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
  File "<ipython-input-15-e4e2860a4364>", line 2, in <module>
tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute __inference_double_168 as input #0(zero-based) was expected to be a string tensor but is a int32 tensor [Op:__inference_double_168]

You may notice that Python arguments are given special treatment in a concrete function's input signature. Prior to TensorFlow 2.3, Python arguments were simply removed from the concrete function's signature. Starting with TensorFlow 2.3, Python arguments remain in the signature, but are constrained to take the value set during tracing.

def pow(a, b):
  return a ** b

square = pow.get_concrete_function(a=tf.TensorSpec(None, tf.float32), b=2)
ConcreteFunction pow(a, b=2)
    a: float32 Tensor, shape=<unknown>
    float32 Tensor, shape=<unknown>

assert square(tf.constant(10.0)) == 100

with assert_raises(TypeError):
  square(tf.constant(10.0), b=3)
Caught expected exception 
  <class 'TypeError'>:

Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/eager/", line 1669, in _call_impl
  File "/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/eager/", line 1714, in _call_with_flat_signature
    self._flat_signature_summary(), ", ".join(sorted(kwargs))))
TypeError: pow(a) got unexpected keyword arguments: b.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
  File "<ipython-input-17-d163f3d206cb>", line 4, in <module>
    square(tf.constant(10.0), b=3)
TypeError: ConcreteFunction pow(a, b) was constructed with int value 2 in b, but was called with int value 3

Obtaining graphs

Each concrete function is a callable wrapper around a tf.Graph. Although retrieving the actual tf.Graph object is not something you'll normally need to do, you can obtain it easily from any concrete function.

graph = double_strings.graph
for node in graph.as_graph_def().node:
  print(f'{node.input} -> {}')

[] -> a
['a', 'a'] -> add
['add'] -> Identity


In general, debugging code is easier in eager mode than inside tf.function. You should ensure that your code executes error-free in eager mode before decorating with tf.function. To assist in the debugging process, you can call tf.config.run_functions_eagerly(True) to globally disable and reenable tf.function.

When tracking down issues that only appear within tf.function, here are some tips:

  • Plain old Python print calls only execute during tracing, helping you track down when your function gets (re)traced.
  • tf.print calls will execute every time, and can help you track down intermediate values during execution.
  • tf.debugging.enable_check_numerics is an easy way to track down where NaNs and Inf are created.
  • pdb can help you understand what's going on during tracing. (Caveat: PDB will drop you into AutoGraph-transformed source code.)

Tracing semantics

Cache key rules

A Function determines whether to reuse a traced concrete function by computing a cache key from an input's args and kwargs.

  • The key generated for a tf.Tensor argument is its shape and dtype.
  • Starting in TensorFlow 2.3, the key generated for a tf.Variable argument is its id().
  • The key generated for a Python primitive is its value. The key generated for nested dicts, lists, tuples, namedtuples, and attrs is the flattened tuple. (As a result of this flattening, calling a concrete function with a different nesting structure than the one used during tracing will result in a TypeError).
  • For all other Python types, the keys are based on the object id() so that methods are traced independently for each instance of a class.

Controlling retracing

Retracing helps ensures that TensorFlow generates correct graphs for each set of inputs. However, tracing is an expensive operation! If your Function retraces a new graph for every call, you'll find that your code executes more slowly than if you didn't use tf.function.

To control the tracing behavior, you can use the following techniques:

  • Specify input_signature in tf.function to limit tracing.
@tf.function(input_signature=(tf.TensorSpec(shape=[None], dtype=tf.int32),))
def next_collatz(x):
  print("Tracing with", x)
  return tf.where(x % 2 == 0, x // 2, 3 * x + 1)

print(next_collatz(tf.constant([1, 2])))
# We specified a 1-D tensor in the input signature, so this should fail.
with assert_raises(ValueError):
  next_collatz(tf.constant([[1, 2], [3, 4]]))

# We specified an int32 dtype in the input signature, so this should fail.
with assert_raises(ValueError):
  next_collatz(tf.constant([1.0, 2.0]))

Tracing with Tensor("x:0", shape=(None,), dtype=int32)
tf.Tensor([4 1], shape=(2,), dtype=int32)
Caught expected exception 
  <class 'ValueError'>:
Caught expected exception 
  <class 'ValueError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
  File "<ipython-input-19-20f544b8adbf>", line 9, in <module>
    next_collatz(tf.constant([[1, 2], [3, 4]]))
ValueError: Python inputs incompatible with input_signature:
  inputs: (
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32))
  input_signature: (
    TensorSpec(shape=(None,), dtype=tf.int32, name=None))
Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
  File "<ipython-input-19-20f544b8adbf>", line 13, in <module>
    next_collatz(tf.constant([1.0, 2.0]))
ValueError: Python inputs incompatible with input_signature:
  inputs: (
    tf.Tensor([1. 2.], shape=(2,), dtype=float32))
  input_signature: (
    TensorSpec(shape=(None,), dtype=tf.int32, name=None))

  • Specify a [None] dimension in tf.TensorSpec to allow for flexibility in trace reuse.

    Since TensorFlow matches tensors based on their shape, using a None dimension as a wildcard will allow Functions to reuse traces for variably-sized input. Variably-sized input can occur if you have sequences of different length, or images of different sizes for each batch (See Transformer and Deep Dream tutorials for example).

@tf.function(input_signature=(tf.TensorSpec(shape=[None], dtype=tf.int32),))
def g(x):
  print('Tracing with', x)
  return x

# No retrace!
print(g(tf.constant([1, 2, 3])))
print(g(tf.constant([1, 2, 3, 4, 5])))

Tracing with Tensor("x:0", shape=(None,), dtype=int32)
tf.Tensor([1 2 3], shape=(3,), dtype=int32)
tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

  • Cast Python arguments to Tensors to reduce retracing.

    Often, Python arguments are used to control hyperparameters and graph constructions - for example, num_layers=10 or training=True or nonlinearity='relu'. So if the Python argument changes, it makes sense that you'd have to retrace the graph.

    However, it's possible that a Python argument is not being used to control graph construction. In these cases, a change in the Python value can trigger needless retracing. Take, for example, this training loop, which AutoGraph will dynamically unroll. Despite the multiple traces, the generated graph is actually identical, so retracing is unnecessary.

def train_one_step():

def train(num_steps):
  print("Tracing with num_steps = ", num_steps)
  tf.print("Executing with num_steps = ", num_steps)
  for _ in tf.range(num_steps):

print("Retracing occurs for different Python arguments.")

print("Traces are reused for Tensor arguments.")
Retracing occurs for different Python arguments.
Tracing with num_steps =  10
Executing with num_steps =  10
Tracing with num_steps =  20
Executing with num_steps =  20

Traces are reused for Tensor arguments.
Tracing with num_steps =  Tensor("num_steps:0", shape=(), dtype=int32)
Executing with num_steps =  10
Executing with num_steps =  20

If you need to force retracing, create a new Function. Separate Function objects are guaranteed not to share traces.

def f():


Python side effects

Python side effects like printing, appending to lists, and mutating globals only happen the first time you call a Function with a set of inputs. Afterwards, the traced tf.Graph is reexecuted, without executing the Python code.

The general rule of thumb is to only use Python side effects to debug your traces. Otherwise, TensorFlow ops like tf.Variable.assign, tf.print, and tf.summary are the best way to ensure your code will be traced and executed by the TensorFlow runtime with each call.

def f(x):
  print("Traced with", x)
  tf.print("Executed with", x)


Traced with 1
Executed with 1
Executed with 1
Traced with 2
Executed with 2

Many Python features, such as generators and iterators, rely on the Python runtime to keep track of state. In general, while these constructs work as expected in eager mode, many unexpected things can happen inside a Function.

To give one example, advancing iterator state is a Python side effect and therefore only happens during tracing.

external_var = tf.Variable(0)
def buggy_consume_next(iterator):
  tf.print("Value of external_var:", external_var)

iterator = iter([0, 1, 2, 3])
# This reuses the first value from the iterator, rather than consuming the next value.

Value of external_var: 0
Value of external_var: 0
Value of external_var: 0

Some iteration constructs are supported through AutoGraph. See the section on AutoGraph Transformations for an overview.

If you would like to execute Python code during each invocation of a Function, tf.py_function is an exit hatch. The drawback of tf.py_function is that it's not portable or particularly performant, nor does it work well in distributed (multi-GPU, TPU) setups. Also, since tf.py_function has to be wired into the graph, it casts all inputs/outputs to tensors.

APIs like tf.gather, tf.stack, and tf.TensorArray can help you implement common looping patterns in native TensorFlow.

external_list = []

def side_effect(x):
  print('Python side effect')

def f(x):
  tf.py_function(side_effect, inp=[x], Tout=[])

# The list append happens all three times!
assert len(external_list) == 3
# The list contains tf.constant(1), not 1, because py_function casts everything to tensors.
assert external_list[0].numpy() == 1

Python side effect
Python side effect
Python side effect


You may encounter an error when creating a new tf.Variable in a function. This error guards against behavior divergence on repeated calls: In eager mode, a function creates a new variable with each call, but in a Function, a new variable may not be created due to trace reuse.

def f(x):
  v = tf.Variable(1.0)
  return v

with assert_raises(ValueError):
Caught expected exception 
  <class 'ValueError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
  File "<ipython-input-26-73e410646579>", line 8, in <module>
ValueError: in user code:

    <ipython-input-26-73e410646579>:3 f  *
        v = tf.Variable(1.0)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/ __call__  **
        return cls._variable_v2_call(*args, **kwargs)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/ _variable_v2_call
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/ getter
        return captured_getter(captured_previous, **kwargs)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/eager/ invalid_creator_scope
        "tf.function-decorated function tried to create "

    ValueError: tf.function-decorated function tried to create variables on non-first call.

You can create variables inside a Function as long as those variables are only created the first time the function is executed.

class Count(tf.Module):
  def __init__(self):
    self.count = None
  def __call__(self):
    if self.count is None:
      self.count = tf.Variable(0)
    return self.count.assign_add(1)

c = Count()
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)

Another error you may encounter is a garbage-collected variable. Unlike normal Python functions, concrete functions only retain WeakRefs to the variables they close over, so you must retain a reference to any variables.

external_var = tf.Variable(3)
def f(x):
  return x * external_var

traced_f = f.get_concrete_function(4)
print("Calling concrete function...")

del external_var
print("Calling concrete function after garbage collecting its closed Variable...")
with assert_raises(tf.errors.FailedPreconditionError):
Calling concrete function...
tf.Tensor(12, shape=(), dtype=int32)

Calling concrete function after garbage collecting its closed Variable...
Caught expected exception 
  <class 'tensorflow.python.framework.errors_impl.FailedPreconditionError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
  File "<ipython-input-28-304a18524b57>", line 14, in <module>
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
  (0) Failed precondition:  Error while reading resource variable _AnonymousVar4 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar4/N10tensorflow3VarE does not exist.
     [[node ReadVariableOp (defined at <ipython-input-28-304a18524b57>:4) ]]
  (1) Failed precondition:  Error while reading resource variable _AnonymousVar4 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar4/N10tensorflow3VarE does not exist.
     [[node ReadVariableOp (defined at <ipython-input-28-304a18524b57>:4) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_f_514]

Function call stack:
f -> f

AutoGraph Transformations

AutoGraph is a library that is on by default in tf.function, and transforms a subset of Python eager code into graph-compatible TensorFlow ops. This includes control flow like if, for, while.

TensorFlow ops like tf.cond and tf.while_loop continue to work, but control flow is often easier to write and understand when written in Python.

# Simple loop

def f(x):
  while tf.reduce_sum(x) > 1:
    x = tf.tanh(x)
  return x

[0.448926926 0.896036148 0.703306437 0.446930766 0.20440042]
[0.421016544 0.714362323 0.6064623 0.419372857 0.201600626]
[0.397786468 0.613405049 0.541632056 0.396401972 0.198913112]
[0.378053397 0.546519518 0.494222373 0.376866162 0.196330562]
[0.361015767 0.497907132 0.457561225 0.359982818 0.1938463]
[0.346108437 0.460469633 0.428094476 0.3451989 0.191454232]
[0.332919776 0.43046692 0.403727621 0.332110822 0.189148799]
[0.321141869 0.405711472 0.383133948 0.320416152 0.18692489]
[0.310539037 0.384825289 0.365426034 0.309883147 0.184777796]
[0.300927401 0.366890609 0.349984437 0.300330788 0.182703182]
[0.292161077 0.351268977 0.336361736 0.291615278 0.180697069]
[0.284122646 0.337500453 0.324225426 0.283620834 0.178755745]
[0.276716352 0.325244069 0.313322544 0.276252925 0.176875815]
[0.269863278 0.314240903 0.303456694 0.269433528 0.175054088]
[0.263497591 0.304290265 0.294472754 0.263097644 0.17328763]
[0.257564 0.295233846 0.2862463 0.257190555 0.171573699]
[0.25201565 0.286944896 0.278676242 0.25166589 0.169909731]
[0.246812463 0.279320478 0.271679461 0.246483982 0.168293342]
[0.24192 0.272276044 0.265186876 0.241610721 0.166722313]
[0.237308443 0.265741408 0.259140551 0.237016559 0.165194541]
[0.23295185 0.25965777 0.253491491 0.232675791 0.163708091]
[0.228827521 0.253975391 0.248197898 0.228565902 0.162261128]
[0.224915475 0.248651937 0.243223906 0.224667087 0.160851941]
[0.221198082 0.243651047 0.238538548 0.220961839 0.159478888]
[0.217659682 0.238941342 0.23411487 0.217434615 0.158140466]
[0.214286327 0.23449555 0.229929343 0.214071587 0.156835243]
[0.211065561 0.230289876 0.225961298 0.210860386 0.155561864]
[0.207986191 0.226303399 0.222192511 0.207789883 0.154319063]
[0.20503816 0.222517684 0.2186068 0.204850093 0.153105617]

<tf.Tensor: shape=(5,), dtype=float32, numpy=
array([0.20221236, 0.2189164 , 0.21518978, 0.20203198, 0.15192041],

If you're curious you can inspect the code autograph generates.

def tf__f(x):
    with ag__.FunctionScope('f', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()

        def get_state():
            return (x,)

        def set_state(vars_):
            nonlocal x
            (x,) = vars_

        def loop_body():
            nonlocal x
            ag__.converted_call(ag__.ld(tf).print, (ag__.ld(x),), None, fscope)
            x = ag__.converted_call(ag__.ld(tf).tanh, (ag__.ld(x),), None, fscope)

        def loop_test():
            return (ag__.converted_call(ag__.ld(tf).reduce_sum, (ag__.ld(x),), None, fscope) > 1)
        ag__.while_stmt(loop_test, loop_body, get_state, set_state, ('x',), {})
            do_return = True
            retval_ = ag__.ld(x)
            do_return = False
        return fscope.ret(retval_, do_return)


AutoGraph will convert some if <condition> statements into the equivalent tf.cond calls. This substitution is made if <condition> is a Tensor. Otherwise, the if statement is executed as a Python conditional.

A Python conditional executes during tracing, so exactly one branch of the conditional will be added to the graph. Without AutoGraph, this traced graph would be unable to take the alternate branch if there is data-dependent control flow.

tf.cond traces and adds both branches of the conditional to the graph, dynamically selecting a branch at execution time. Tracing can have unintended side effects; see AutoGraph tracing effects for more.

def fizzbuzz(n):
  for i in tf.range(1, n + 1):
    print('Tracing for loop')
    if i % 15 == 0:
      print('Tracing fizzbuzz branch')
    elif i % 3 == 0:
      print('Tracing fizz branch')
    elif i % 5 == 0:
      print('Tracing buzz branch')
      print('Tracing default branch')

Tracing for loop
Tracing fizzbuzz branch
Tracing fizz branch
Tracing buzz branch
Tracing default branch

See the reference documentation for additional restrictions on AutoGraph-converted if statements.


AutoGraph will convert some for and while statements into the equivalent TensorFlow looping ops, like tf.while_loop. If not converted, the for or while loop is executed as a Python loop.

This substitution is made in the following situations:

  • for x in y: if y is a Tensor, convert to tf.while_loop. In the special case where y is a, a combination of ops are generated.
  • while <condition>: if <condition> is a Tensor, convert to tf.while_loop.

A Python loop executes during tracing, adding additional ops to the tf.Graph for every iteration of the loop.

A TensorFlow loop traces the body of the loop, and dynamically selects how many iterations to run at execution time. The loop body only appears once in the generated tf.Graph.

See the reference documentation for additional restrictions on AutoGraph-converted for and while statements.

Looping over Python data

A common pitfall is to loop over Python/Numpy data within a tf.function. This loop will execute during the tracing process, adding a copy of your model to the tf.Graph for each iteration of the loop.

If you want to wrap the entire training loop in tf.function, the safest way to do this is to wrap your data as a so that AutoGraph will dynamically unroll the training loop.

def measure_graph_size(f, *args):
  g = f.get_concrete_function(*args).graph
  print("{}({}) contains {} nodes in its graph".format(
      f.__name__, ', '.join(map(str, args)), len(g.as_graph_def().node)))

def train(dataset):
  loss = tf.constant(0)
  for x, y in dataset:
    loss += tf.abs(y - x) # Some dummy computation.
  return loss

small_data = [(1, 1)] * 3
big_data = [(1, 1)] * 10
measure_graph_size(train, small_data)
measure_graph_size(train, big_data)

    lambda: small_data, (tf.int32, tf.int32)))
    lambda: big_data, (tf.int32, tf.int32)))
train([(1, 1), (1, 1), (1, 1)]) contains 11 nodes in its graph
train([(1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)]) contains 32 nodes in its graph
train(<FlatMapDataset shapes: (<unknown>, <unknown>), types: (tf.int32, tf.int32)>) contains 8 nodes in its graph
train(<FlatMapDataset shapes: (<unknown>, <unknown>), types: (tf.int32, tf.int32)>) contains 8 nodes in its graph

When wrapping Python/Numpy data in a Dataset, be mindful of versus The former will keep the data in Python and fetch it via tf.py_function which can have performance implications, whereas the latter will bundle a copy of the data as one large tf.constant() node in the graph, which can have memory implications.

Reading data from files via TFRecordDataset/CsvDataset/etc. is the most effective way to consume data, as then TensorFlow itself can manage the asynchronous loading and prefetching of data, without having to involve Python. To learn more, see the guide.

Accumulating values in a loop

A common pattern is to accumulate intermediate values from a loop. Normally, this is accomplished by appending to a Python list or adding entries to a Python dictionary. However, as these are Python side effects, they will not work as expected in a dynamically unrolled loop. Use tf.TensorArray to accumulate results from a dynamically unrolled loop.

batch_size = 2
seq_len = 3
feature_size = 4

def rnn_step(inp, state):
  return inp + state

def dynamic_rnn(rnn_step, input_data, initial_state):
  # [batch, time, features] -> [time, batch, features]
  input_data = tf.transpose(input_data, [1, 0, 2])
  max_seq_len = input_data.shape[0]

  states = tf.TensorArray(tf.float32, size=max_seq_len)
  state = initial_state
  for i in tf.range(max_seq_len):
    state = rnn_step(input_data[i], state)
    states = states.write(i, state)
  return tf.transpose(states.stack(), [1, 0, 2])
            tf.random.uniform([batch_size, seq_len, feature_size]),
            tf.zeros([batch_size, feature_size]))
<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[0.2486304 , 0.0612042 , 0.69624186, 0.28587592],
        [1.2193475 , 0.2389338 , 1.5216837 , 0.38649392],
        [1.7640524 , 1.1970762 , 2.3265643 , 0.81419575]],

       [[0.36599267, 0.41830885, 0.73540664, 0.63987565],
        [0.48354673, 1.1808103 , 1.7210082 , 0.8333106 ],
        [0.7138835 , 1.2030114 , 1.8544207 , 1.1647347 ]]], dtype=float32)>

Further reading

To learn about how to export and load a Function, see the SavedModel guide. To learn more about graph optimizations that are performed after tracing, see the Grappler guide. To learn how to optimize your data pipeline and profile your model, see the Profiler guide.