Azure blob storage with TensorFlow

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

Overview

This tutorial shows how to use read and write files on Azure Blob Storage with TensorFlow, through TensorFlow IO's Azure file system integration.

An Azure storage account is needed to read and write files on Azure Blob Storage. The Azure Storage Key should be provided through environmental variable:

os.environ['TF_AZURE_STORAGE_KEY'] = '<key>'

The storage account name and container name are part of the filename uri:

azfs://<storage-account-name>/<container-name>/<path>

In this tutorial, for demo purposes you can optionally setup Azurite which is a Azure Storage emulator. With Azurite emulator it is possible to read and write files through Azure blob storage interface with TensorFlow.

Setup and usage

Install required packages, and restart runtime

try:
  %tensorflow_version 2.x 
except Exception:
  pass

!pip install tensorflow-io
TensorFlow 2.x selected.
Collecting tensorflow-io
[?25l  Downloading https://files.pythonhosted.org/packages/c0/d0/c5d7adce72c6a6d7c9a59c062150f60b5404c706578a0922f7dc2835713c/tensorflow_io-0.12.0-cp36-cp36m-manylinux2010_x86_64.whl (20.1MB)
[K     |████████████████████████████████| 20.1MB 42.7MB/s 
[?25hRequirement already satisfied: tensorflow<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow-io) (2.1.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0)
Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)
Requirement already satisfied: grpcio>=1.8.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.27.2)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.34.2)
Requirement already satisfied: google-pasta>=0.1.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.1.8)
Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)
Requirement already satisfied: wrapt>=1.11.1 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.12.0)
Requirement already satisfied: scipy==1.4.1; python_version >= "3" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.4.1)
Requirement already satisfied: protobuf>=3.8.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.11.3)
Requirement already satisfied: termcolor>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)
Requirement already satisfied: absl-py>=0.7.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.9.0)
Requirement already satisfied: keras-applications>=1.0.8 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.8)
Requirement already satisfied: astor>=0.6.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.8.1)
Requirement already satisfied: gast==0.2.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.2)
Requirement already satisfied: keras-preprocessing>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)
Requirement already satisfied: numpy<2.0,>=1.16.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.18.1)
Requirement already satisfied: six>=1.12.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.14.0)
Requirement already satisfied: setuptools>=41.0.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (45.2.0)
Requirement already satisfied: markdown>=2.6.8 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.2.1)
Requirement already satisfied: google-auth<2,>=1.6.3 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.11.2)
Requirement already satisfied: requests<3,>=2.21.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.22.0)
Requirement already satisfied: werkzeug>=0.11.15 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.0)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.1)
Requirement already satisfied: h5py in /tensorflow-2.1.0/python3.6 (from keras-applications>=1.0.8->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.10.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0.0)
Requirement already satisfied: rsa<4.1,>=3.1.4 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.8)
Requirement already satisfied: idna<2.9,>=2.5 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2019.11.28)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.25.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /tensorflow-2.1.0/python3.6 (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.3.0)
Requirement already satisfied: pyasn1>=0.1.3 in /tensorflow-2.1.0/python3.6 (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /tensorflow-2.1.0/python3.6 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0)
Installing collected packages: tensorflow-io
Successfully installed tensorflow-io-0.12.0

Install and setup Azurite (optional)

In case an Azure Storage Account is not available, the following is needed to install and setup Azurite that emulates the Azure Storage interface:

npm install azurite@2.7.0
[K[?25hnpm WARN deprecated request@2.87.0: request has been deprecated, see https://github.com/request/request/issues/3142
[K[?25hnpm WARN saveError ENOENT: no such file or directory, open '/content/package.json'
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN enoent ENOENT: no such file or directory, open '/content/package.json'
npm WARN content No description
npm WARN content No repository field.
npm WARN content No README data
npm WARN content No license field.

+ azurite@2.7.0
added 116 packages from 141 contributors in 6.591s

# The path for npm might not be exposed in PATH env,
# you can find it out through 'npm bin' command
npm_bin_path = get_ipython().getoutput('npm bin')[0]
print('npm bin path: ', npm_bin_path)

# Run `azurite-blob -s` as a background process. 
# IPython doesn't recognize `&` in inline bash cells.
get_ipython().system_raw(npm_bin_path + '/' + 'azurite-blob -s &')
npm bin path:  /content/node_modules/.bin

Read and write files to Azure Storage with TensorFlow

The following is an example of reading and writing files to Azure Storage with TensorFlow's API.

It behaves the same way as other file systems (e.g., POSIX or GCS) in TensorFlow once tensorflow-io package is imported, as tensorflow-io will automatically register azfs scheme for use.

The Azure Storage Key should be provided through TF_AZURE_STORAGE_KEY environmental variable. Otherwise TF_AZURE_USE_DEV_STORAGE could be set to True to use Azurite emulator instead:

import os
import tensorflow as tf
import tensorflow_io as tfio

# Switch to False to use Azure Storage instead:
use_emulator = True

if use_emulator:
  os.environ['TF_AZURE_USE_DEV_STORAGE'] = '1'
  account_name = 'devstoreaccount1'
else:
  # Replace <key> with Azure Storage Key, and <account> with Azure Storage Account
  os.environ['TF_AZURE_STORAGE_KEY'] = '<key>'
  account_name = '<account>'
pathname = 'az://{}/aztest'.format(account_name)
tf.io.gfile.mkdir(pathname)

filename = pathname + '/hello.txt'
with tf.io.gfile.GFile(filename, mode='w') as w:
  w.write("Hello, world!")

with tf.io.gfile.GFile(filename, mode='r') as r:
  print(r.read())
Hello, world!

Configurations

Configurations of Azure Blob Storage in TensorFlow are always done through environmental variables. Below is a complete list of available configurations:

  • TF_AZURE_USE_DEV_STORAGE: Set to 1 to use local development storage emulator for connections like 'az://devstoreaccount1/container/file.txt'. This will take precendence over all other settings so unset to use any other connection
  • TF_AZURE_STORAGE_KEY: Account key for the storage account in use
  • TF_AZURE_STORAGE_USE_HTTP: Set to any value if you don't want to use https transfer. unset to use default of https
  • TF_AZURE_STORAGE_BLOB_ENDPOINT: Set to the endpoint of blob storage - default is .core.windows.net.