Skip to content

Latest commit

 

History

History

tf_sig_build_dockerfiles

TF SIG Build Dockerfiles

Standard Dockerfiles for TensorFlow builds, used internally at Google.

Maintainer: @angerson (TensorFlow OSS DevInfra; SIG Build)


These docker containers are for building and testing TensorFlow in CI environments (and for users replicating those CI builds). They are openly developed in TF SIG Build, verified by Google developers, and published to tensorflow/build on Docker Hub. The TensorFlow OSS DevInfra team uses these containers for most of our Linux-based CI, including tf-nightly tests and Pip packages and TF release packages for TensorFlow 2.9 onwards.

Tags

These Dockerfiles are built and deployed to Docker Hub via Github Actions.

The tags are defined as such:

  • The latest tags are kept up-to-date to build TensorFlow's master branch.
  • The version number tags target the corresponding TensorFlow version. We continuously build the current-tensorflow-version + 1 tag, so when a new TensorFlow branch is cut, that Dockerfile is frozen to support that branch.
  • We support the same Python versions that TensorFlow does.

Updating the Containers

For simple changes, you can adjust the source files and then make a PR. Send it to @angerson for review. We have presubmits that will make sure your change still builds a container. After approval and submission, our GitHub Actions workflow deploys the containers to Docker Hub.

  • To update Python packages, look at devel.requirements.txt
  • To update system packages, look at devel.packages.txt
  • To update the way bazel build works, look at devel.usertools/*.bazelrc.

To rebuild the containers locally after making changes, use this command from this directory:

DOCKER_BUILDKIT=1 docker build \
  --build-arg PYTHON_VERSION=python3.9 --target=devel -t my-tf-devel .

It will take a long time to build devtoolset and install CUDA packages. After it's done, you can use the commands below to test your changes. Just replace tensorflow/build:latest-python3.9 with my-tf-devel to use your image instead.

Automatic GCR.io Builds for Presubmits

TensorFlow team members (i.e. Google employees) can apply a Build and deploy to gcr.io for staging tag to their PRs to the Dockerfiles, as long as the PR is being developed on a branch of this repository, not a fork. Unfortunately this is not available for non-Googler contributors for security reasons.

Run the TensorFlow Team's Nightly Test Suites with Docker

The TensorFlow DevInfra team runs a daily test suite that builds tf-nightly and runs a bazel test suite on both the Pip package (the "pip" tests) and on the source code itself (the "nonpip" tests). These test scripts are often referred to as "The Nightly Tests" and can be a common reason for a TF PR to be reverted. The build scripts aren't visible to external users, but they use the configuration files which are included in these containers. Our test suites, which include the build of tf-nightly, are easy to replicate with these containers, and here is how you can do it.

Presubmits are not using these containers... yet.

Here are some important notes to keep in mind:

  • The Ubuntu CI jobs that build the tf-nightly package build at the GitHub nightly tag. You can see the specific commit of a tf-nightly package on pypi.org in tf.version.GIT_VERSION, which will look something like v1.12.1-67282-g251085598b7. The final section, g251085598b7, is a short git hash.

  • If you interrupt a docker exec command with ctrl-c, you will get your shell back but the command will continue to run. You cannot reattach to it, but you can kill it with docker kill tf (or docker kill the-container-name). This will destroy your container but will not harm your work since it's mounted. If you have any suggestions for handling this better, let us know.

Now let's build tf-nightly.

  1. Set up your directories:

    • A directory with the TensorFlow source code, e.g. /tmp/tensorflow
    • A directory for TensorFlow packages built in the container, e.g. /tmp/packages
    • A directory for your local bazel cache (can be empty), e.g. /tmp/bazelcache
  2. Choose the Docker container to use from Docker Hub. The options for the master branch are:

    • tensorflow/build:latest-python3.11
    • tensorflow/build:latest-python3.10
    • tensorflow/build:latest-python3.9

    For this example we'll use tensorflow/build:latest-python3.9.

  3. Pull the container you decided to use.

    docker pull tensorflow/build:latest-python3.9
  4. Start a backgrounded Docker container with the three folders mounted.

    • Mount the TensorFlow source code to /tf/tensorflow.
    • Mount the directory for built packages to /tf/pkg.
    • Mount the bazel cache to /tf/cache. You don't need /tf/cache if you're going to use the remote cache.

    Here are the arguments we're using:

    • --name tf: Names the container tf so we can refer to it later.
    • -w /tf/tensorflow: All commands run in the /tf/tensorflow directory, where the TF source code is.
    • -it: Makes the container interactive for running commands
    • -d: Makes the container start in the background, so we can send commands to it instead of running commands from inside.

    And -v is for mounting directories into the container.

    docker run --name tf -w /tf/tensorflow -it -d \
      --env TF_PYTHON_VERSION=3.9 \
      -v "/tmp/packages:/tf/pkg" \
      -v "/tmp/tensorflow:/tf/tensorflow" \
      -v "/tmp/bazelcache:/tf/cache" \
      tensorflow/build:latest-python3.9 \
      bash

    Note: if you wish to use your own Google Cloud Platform credentials for e.g. RBE, you may also wish to set -v $HOME/.config/gcloud:/root/.config/gcloud to make your credentials available to bazel. You don't need to do this unless you know what you're doing.

Now you can continue on to any of:

  • Build tf-nightly and then (optionally) run a test suite on the pip package (the "pip" suite)
  • Run a test suite on the TF code directly (the "nonpip" suite)
  • Build the libtensorflow packages (the "libtensorflow" suite)
  • Run a code-correctness check (the "code_check" suite)

Build tf-nightly and run Pip tests

  1. Apply the update_version.py script that changes the TensorFlow version to X.Y.Z.devYYYYMMDD. This is used for tf-nightly on PyPI and is technically optional.

    docker exec tf python3 tensorflow/tools/ci_build/update_version.py --nightly
  2. Build TensorFlow by following the instructions under one of the collapsed sections below. You can build both CPU and GPU packages without a GPU. TF DevInfra's remote cache is better for building TF only once, but if you build over and over, it will probably be better in the long run to use a local cache. We're not sure about which is best for most users, so let us know on Gitter.

    This step will take a long time, since you're building TensorFlow. GPU takes much longer to build. Choose one and click on the arrow to expand the commands:

    TF Nightly CPU - Remote Cache

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    build --config=sigbuild_remote_cache \
    tensorflow/tools/pip_package:build_pip_package
    

    And then construct the pip package:

    docker exec tf \
      ./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
      /tf/pkg \
      --cpu \
      --nightly_flag
    
    TF Nightly GPU - Remote Cache

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
    build --config=sigbuild_remote_cache \
    tensorflow/tools/pip_package:build_pip_package
    

    And then construct the pip package:

    docker exec tf \
      ./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
      /tf/pkg \
      --nightly_flag
    
    TF Nightly CPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    build --config=sigbuild_local_cache \
    tensorflow/tools/pip_package:build_pip_package
    

    And then construct the pip package:

    docker exec tf \
      ./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
      /tf/pkg \
      --cpu \
      --nightly_flag
    
    TF Nightly GPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    Build the sources with Bazel:

    docker exec tf \
    bazel --bazelrc=/usertools/gpu.bazelrc \
    build --config=sigbuild_local_cache \
    tensorflow/tools/pip_package:build_pip_package
    

    And then construct the pip package:

    docker exec tf \
      ./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
      /tf/pkg \
      --nightly_flag
    
  3. Run the helper script that checks for manylinux compliance, renames the wheels, and then checks the size of the packages.

    docker exec tf /usertools/rename_and_verify_wheels.sh
    
  4. Take a look at the new wheel packages you built! They may be owned by root because of how Docker volume permissions work.

    ls -al /tmp/packages
    
  5. To continue on to running the Pip tests, create a venv and install the testing packages:

    docker exec tf /usertools/setup_venv_test.sh bazel_pip "/tf/pkg/tf_nightly*.whl"
    
  6. And now run the tests depending on your target platform: --config=pip includes the same test suite that is run by the DevInfra team every night. If you want to run a specific test instead of the whole suite, pass --config=pip_venv instead, and then set the target on the command like normal.

    TF Nightly CPU - Remote Cache

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    test --config=sigbuild_remote_cache \
    --config=pip
    
    TF Nightly GPU - Remote Cache

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
    test --config=sigbuild_remote_cache \
    --config=pip
    
    TF Nightly CPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    test --config=sigbuild_local_cache \
    --config=pip
    
    TF Nightly GPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    Build the sources with Bazel:

    docker exec tf \
    bazel --bazelrc=/usertools/gpu.bazelrc \
    test --config=sigbuild_local_cache \
    --config=pip
    

Run Nonpip Tests

  1. Run the tests depending on your target platform. --config=nonpip includes the same test suite that is run by the DevInfra team every night. If you want to run a specific test instead of the whole suite, you do not need --config=nonpip at all; just set the target on the command line like usual.

    TF Nightly CPU - Remote Cache

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    test --config=sigbuild_remote_cache \
    --config=nonpip
    
    TF Nightly GPU - Remote Cache

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
    test --config=sigbuild_remote_cache \
    --config=nonpip
    
    TF Nightly CPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    Build the sources with Bazel:

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    test --config=sigbuild_local_cache \
    --config=nonpip
    
    TF Nightly GPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    Build the sources with Bazel:

    docker exec tf \
    bazel --bazelrc=/usertools/gpu.bazelrc \
    test --config=sigbuild_local_cache \
    --config=nonpip
    

Test, build and package libtensorflow

  1. Run the tests depending on your target platform. --config=libtensorflow_test includes the same test suite that is run by the DevInfra team every night. If you want to run a specific test instead of the whole suite, just set the target on the command line like usual.

    TF Nightly CPU - Remote Cache
    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    test --config=sigbuild_remote_cache \
    --config=libtensorflow_test
    
    TF Nightly GPU - Remote Cache
    docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
    test --config=sigbuild_remote_cache \
    --config=libtensorflow_test
    
    TF Nightly CPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    test --config=sigbuild_local_cache \
    --config=libtensorflow_test
    
    TF Nightly GPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    docker exec tf \
    bazel --bazelrc=/usertools/gpu.bazelrc \
    test --config=sigbuild_local_cache \
    --config=libtensorflow_test
    
  2. Build the libtensorflow packages.

    TF Nightly CPU - Remote Cache
    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    build --config=sigbuild_remote_cache \
    --config=libtensorflow_build
    
    TF Nightly GPU - Remote Cache
    docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
    build --config=sigbuild_remote_cache \
    --config=libtensorflow_build
    
    TF Nightly CPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
    build --config=sigbuild_local_cache \
    --config=libtensorflow_build
    
    TF Nightly GPU - Local Cache

    Make sure you have a directory mounted to the container in /tf/cache!

    docker exec tf \
    bazel --bazelrc=/usertools/gpu.bazelrc \
    build --config=sigbuild_local_cache \
    --config=libtensorflow_build
    
  3. Run the repack_libtensorflow.sh utility to repack and rename the archives.

    CPU
    docker exec tf /usertools/repack_libtensorflow.sh /tf/pkg "-cpu-linux-x86_64"
    
    GPU
    docker exec tf /usertools/repack_libtensorflow.sh /tf/pkg "-gpu-linux-x86_64"
    

Run a code check

  1. Every night the TensorFlow team runs code_check_full, which contains a suite of checks that were gradually introduced over TensorFlow's lifetime to prevent certain unsable code states. This check has supplanted the old "sanity" or "ci_sanity" checks.

    docker exec tf bats /usertools/code_check_full.bats --timing --formatter junit
    

Clean Up

  1. Shut down and remove the container when you are finished.

    docker stop tf
    docker rm tf