Performance is often a significant issue when training a machine learning model. This section explains various ways to optimize performance. Start your investigation with the following guide:
- Performance, which contains a collection of best practices for optimizing your TensorFlow code.
XLA (Accelerated Linear Algebra) is an experimental compiler for linear algebra that optimizes TensorFlow computations. The following guides explore XLA:
- XLA Overview, which introduces XLA.
- Broadcasting Semantics, which describes XLA's broadcasting semantics.
- Developing a new back end for XLA, which explains how to re-target TensorFlow in order to optimize the performance of the computational graph for particular hardware.
- Using JIT Compilation, which describes the XLA JIT compiler that compiles and runs parts of TensorFlow graphs via XLA in order to optimize performance.
- Operation Semantics, which is a reference manual
describing the semantics of operations in the
- Shapes and Layout, which details the
- Using AOT compilation, which explains
tfcompile, a standalone tool that compiles TensorFlow graphs into executable code in order to optimize performance.
And finally, we offer the following guide:
- How to Quantize Neural Networks with TensorFlow, which can explains how to use quantization to reduce model size, both in storage and at runtime. Quantization can improve performance, especially on mobile hardware.