REINFORCE agent

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

Introduction

This example shows how to train a REINFORCE agent on the Cartpole environment using the TF-Agents library, similar to the DQN tutorial.

Cartpole environment

We will walk you through all the components in a Reinforcement Learning (RL) pipeline for training, evaluation and data collection.

Setup

If you haven't installed the following dependencies, run:

sudo apt-get install -y xvfb ffmpeg
pip install -q 'gym==0.10.11'
pip install -q 'imageio==2.4.0'
pip install -q PILLOW
pip install -q 'pyglet==1.3.2'
pip install -q pyvirtualdisplay
pip install -q tf-agents



The following additional packages will be installed:
  i965-va-driver libaacs0 libass9 libasyncns0 libavc1394-0 libavcodec57
  libavdevice57 libavfilter6 libavformat57 libavresample3 libavutil55
  libbdplus0 libbluray2 libbs2b0 libcaca0 libcdio-cdda2 libcdio-paranoia2
  libcdio17 libchromaprint1 libcrystalhd3 libdc1394-22 libfftw3-double3
  libflac8 libflite1 libgme0 libgsm1 libiec61883-0 libjack-jackd2-0
  libmp3lame0 libmpg123-0 libmysofa0 libnorm1 libnuma1 libogg0 libopenal-data
  libopenal1 libopenjp2-7 libopenmpt0 libopus0 libpgm-5.2-0 libpostproc54
  libpulse0 libraw1394-11 librubberband2 libsamplerate0 libsdl2-2.0-0
  libshine3 libsnappy1v5 libsndfile1 libsndio6.1 libsodium23 libsoxr0
  libspeex1 libssh-gcrypt-4 libswresample2 libswscale4 libtheora0 libtwolame0
  libva-drm2 libva-x11-2 libva2 libvorbis0a libvorbisenc2 libvorbisfile3
  libvpx5 libwavpack1 libwebpmux3 libx264-152 libx265-146 libxss1 libxv1
  libxvidcore4 libzmq5 libzvbi-common libzvbi0 mesa-va-drivers va-driver-all
  xserver-common
Suggested packages:
  ffmpeg-doc i965-va-driver-shaders libbluray-bdj firmware-crystalhd
  libfftw3-bin libfftw3-dev jackd2 libportaudio2 opus-tools pulseaudio
  libraw1394-doc sndiod speex
Recommended packages:
  xfonts-base
The following NEW packages will be installed:
  ffmpeg i965-va-driver libaacs0 libass9 libasyncns0 libavc1394-0 libavcodec57
  libavdevice57 libavfilter6 libavformat57 libavresample3 libavutil55
  libbdplus0 libbluray2 libbs2b0 libcaca0 libcdio-cdda2 libcdio-paranoia2
  libcdio17 libchromaprint1 libcrystalhd3 libdc1394-22 libfftw3-double3
  libflac8 libflite1 libgme0 libgsm1 libiec61883-0 libjack-jackd2-0
  libmp3lame0 libmpg123-0 libmysofa0 libnorm1 libnuma1 libogg0 libopenal-data
  libopenal1 libopenjp2-7 libopenmpt0 libopus0 libpgm-5.2-0 libpostproc54
  libpulse0 libraw1394-11 librubberband2 libsamplerate0 libsdl2-2.0-0
  libshine3 libsnappy1v5 libsndfile1 libsndio6.1 libsodium23 libsoxr0
  libspeex1 libssh-gcrypt-4 libswresample2 libswscale4 libtheora0 libtwolame0
  libva-drm2 libva-x11-2 libva2 libvorbis0a libvorbisenc2 libvorbisfile3
  libvpx5 libwavpack1 libwebpmux3 libx264-152 libx265-146 libxss1 libxv1
  libxvidcore4 libzmq5 libzvbi-common libzvbi0 mesa-va-drivers va-driver-all
  xvfb
The following packages will be upgraded:
  xserver-common
1 upgraded, 79 newly installed, 0 to remove and 110 not upgraded.
Need to get 35.6 MB of archives.
After this operation, 128 MB of additional disk space will be used.
Get:1 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libogg0 amd64 1.3.2-1 [17.2 kB]
Get:2 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libxss1 amd64 1:1.2.2-1 [8582 B]
Get:3 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libnuma1 amd64 2.0.11-2.1ubuntu0.1 [22.0 kB]
Get:4 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libva2 amd64 2.1.0-3 [47.6 kB]
Get:5 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libva-drm2 amd64 2.1.0-3 [6880 B]
Get:6 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libva-x11-2 amd64 2.1.0-3 [11.5 kB]
Get:7 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libavutil55 amd64 7:3.4.6-0ubuntu0.18.04.1 [190 kB]
Get:8 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libcrystalhd3 amd64 1:0.0~git20110715.fdd2f19-12 [45.8 kB]
Get:9 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libgsm1 amd64 1.0.13-4build1 [22.4 kB]
Get:10 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libmp3lame0 amd64 3.100-2 [136 kB]
Get:11 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libopenjp2-7 amd64 2.3.0-2build0.18.04.1 [145 kB]
Get:12 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libopus0 amd64 1.1.2-1ubuntu1 [159 kB]
Get:13 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libshine3 amd64 3.1.1-1 [22.9 kB]
Get:14 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libsnappy1v5 amd64 1.1.7-1 [16.0 kB]
Get:15 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libspeex1 amd64 1.2~rc1.2-1ubuntu2 [52.1 kB]
Get:16 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libsoxr0 amd64 0.1.2-3 [65.9 kB]
Get:17 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libswresample2 amd64 7:3.4.6-0ubuntu0.18.04.1 [55.2 kB]
Get:18 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libtheora0 amd64 1.1.1+dfsg.1-14 [170 kB]
Get:19 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libtwolame0 amd64 0.3.13-3 [46.7 kB]
Get:20 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libvorbis0a amd64 1.3.5-4.2 [86.4 kB]
Get:21 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libvorbisenc2 amd64 1.3.5-4.2 [70.7 kB]
Get:22 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libvpx5 amd64 1.7.0-3ubuntu0.18.04.1 [796 kB]
Get:23 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libwavpack1 amd64 5.1.0-2ubuntu1.4 [76.6 kB]
Get:24 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libwebpmux3 amd64 0.6.1-2 [19.6 kB]
Get:25 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libx264-152 amd64 2:0.152.2854+gite9a5903-2 [609 kB]
Get:26 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libx265-146 amd64 2.6-3 [1026 kB]
Get:27 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libxvidcore4 amd64 2:1.3.5-1 [200 kB]
Get:28 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libzvbi-common all 0.2.35-13 [32.1 kB]
Get:29 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libzvbi0 amd64 0.2.35-13 [235 kB]
Get:30 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libavcodec57 amd64 7:3.4.6-0ubuntu0.18.04.1 [4592 kB]
Get:31 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libraw1394-11 amd64 2.1.2-1 [30.7 kB]
Get:32 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libavc1394-0 amd64 0.5.4-4build1 [16.1 kB]
Get:33 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libass9 amd64 1:0.14.0-1 [88.2 kB]
Get:34 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libbluray2 amd64 1:1.0.2-3 [141 kB]
Get:35 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libchromaprint1 amd64 1.4.3-1 [36.8 kB]
Get:36 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libgme0 amd64 0.6.2-1 [121 kB]
Get:37 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libmpg123-0 amd64 1.25.10-1 [125 kB]
Get:38 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libvorbisfile3 amd64 1.3.5-4.2 [16.0 kB]
Get:39 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libopenmpt0 amd64 0.3.6-1 [561 kB]
Get:40 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libssh-gcrypt-4 amd64 0.8.0~20170825.94fa1e38-1ubuntu0.6 [172 kB]
Get:41 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libavformat57 amd64 7:3.4.6-0ubuntu0.18.04.1 [949 kB]
Get:42 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libavresample3 amd64 7:3.4.6-0ubuntu0.18.04.1 [52.6 kB]
Get:43 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libbs2b0 amd64 3.1.0+dfsg-2.2 [10.5 kB]
Get:44 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libflite1 amd64 2.1-release-1 [12.8 MB]
Get:45 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libmysofa0 amd64 0.6~dfsg0-2ubuntu0.18.04.1 [38.1 kB]
Get:46 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libpostproc54 amd64 7:3.4.6-0ubuntu0.18.04.1 [50.4 kB]
Get:47 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libfftw3-double3 amd64 3.3.7-1 [735 kB]
Get:48 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libsamplerate0 amd64 0.1.9-1 [938 kB]
Get:49 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 librubberband2 amd64 1.8.1-7ubuntu2 [86.7 kB]
Get:50 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libswscale4 amd64 7:3.4.6-0ubuntu0.18.04.1 [150 kB]
Get:51 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libnorm1 amd64 1.5r6+dfsg1-6 [224 kB]
Get:52 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libpgm-5.2-0 amd64 5.2.122~dfsg-2 [157 kB]
Get:53 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libsodium23 amd64 1.0.16-2 [143 kB]
Get:54 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libzmq5 amd64 4.2.5-1ubuntu0.2 [221 kB]
Get:55 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libavfilter6 amd64 7:3.4.6-0ubuntu0.18.04.1 [874 kB]
Get:56 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libcaca0 amd64 0.99.beta19-2ubuntu0.18.04.1 [203 kB]
Get:57 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libcdio17 amd64 1.0.0-2ubuntu2 [58.8 kB]
Get:58 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libcdio-cdda2 amd64 10.2+0.94+2-2build1 [17.7 kB]
Get:59 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libcdio-paranoia2 amd64 10.2+0.94+2-2build1 [17.2 kB]
Get:60 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libdc1394-22 amd64 2.2.5-1 [77.5 kB]
Get:61 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libiec61883-0 amd64 1.2.0-2 [23.5 kB]
Get:62 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libjack-jackd2-0 amd64 1.9.12~dfsg-2 [263 kB]
Get:63 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libopenal-data all 1:1.18.2-2 [102 kB]
Get:64 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libsndio6.1 amd64 1.1.0-3 [23.4 kB]
Get:65 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libopenal1 amd64 1:1.18.2-2 [266 kB]
Get:66 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libasyncns0 amd64 0.8-6 [12.1 kB]
Get:67 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libflac8 amd64 1.3.2-1 [213 kB]
Get:68 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libsndfile1 amd64 1.0.28-4ubuntu0.18.04.1 [170 kB]
Get:69 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpulse0 amd64 1:11.1-1ubuntu7.8 [266 kB]
Get:70 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libsdl2-2.0-0 amd64 2.0.8+dfsg1-1ubuntu1.18.04.4 [382 kB]
Get:71 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/main amd64 libxv1 amd64 2:1.0.11-1 [10.7 kB]
Get:72 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libavdevice57 amd64 7:3.4.6-0ubuntu0.18.04.1 [75.1 kB]
Get:73 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 ffmpeg amd64 7:3.4.6-0ubuntu0.18.04.1 [1587 kB]
Get:74 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libaacs0 amd64 0.9.0-1 [51.4 kB]
Get:75 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 libbdplus0 amd64 0.1.2-2 [46.6 kB]
Get:76 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 mesa-va-drivers amd64 19.2.8-0ubuntu0~18.04.3 [2288 kB]
Get:77 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 i965-va-driver amd64 2.1.0-0ubuntu1 [925 kB]
Get:78 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 va-driver-all amd64 2.1.0-3 [4376 B]
Get:79 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/main amd64 xserver-common all 2:1.19.6-1ubuntu4.4 [27.3 kB]
Get:80 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 xvfb amd64 2:1.19.6-1ubuntu4.4 [784 kB]
Fetched 35.6 MB in 13s (2814 kB/s)
Extracting templates from packages: 100%
Selecting previously unselected package libogg0:amd64.
(Reading database ... 192286 files and directories currently installed.)
Preparing to unpack .../00-libogg0_1.3.2-1_amd64.deb ...
Unpacking libogg0:amd64 (1.3.2-1) ...
Selecting previously unselected package libxss1:amd64.
Preparing to unpack .../01-libxss1_1%3a1.2.2-1_amd64.deb ...
Unpacking libxss1:amd64 (1:1.2.2-1) ...
Selecting previously unselected package libnuma1:amd64.
Preparing to unpack .../02-libnuma1_2.0.11-2.1ubuntu0.1_amd64.deb ...
Unpacking libnuma1:amd64 (2.0.11-2.1ubuntu0.1) ...
Selecting previously unselected package libva2:amd64.
Preparing to unpack .../03-libva2_2.1.0-3_amd64.deb ...
Unpacking libva2:amd64 (2.1.0-3) ...
Selecting previously unselected package libva-drm2:amd64.
Preparing to unpack .../04-libva-drm2_2.1.0-3_amd64.deb ...
Unpacking libva-drm2:amd64 (2.1.0-3) ...
Selecting previously unselected package libva-x11-2:amd64.
Preparing to unpack .../05-libva-x11-2_2.1.0-3_amd64.deb ...
Unpacking libva-x11-2:amd64 (2.1.0-3) ...
Selecting previously unselected package libavutil55:amd64.
Preparing to unpack .../06-libavutil55_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libavutil55:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libcrystalhd3:amd64.
Preparing to unpack .../07-libcrystalhd3_1%3a0.0~git20110715.fdd2f19-12_amd64.deb ...
Unpacking libcrystalhd3:amd64 (1:0.0~git20110715.fdd2f19-12) ...
Selecting previously unselected package libgsm1:amd64.
Preparing to unpack .../08-libgsm1_1.0.13-4build1_amd64.deb ...
Unpacking libgsm1:amd64 (1.0.13-4build1) ...
Selecting previously unselected package libmp3lame0:amd64.
Preparing to unpack .../09-libmp3lame0_3.100-2_amd64.deb ...
Unpacking libmp3lame0:amd64 (3.100-2) ...
Selecting previously unselected package libopenjp2-7:amd64.
Preparing to unpack .../10-libopenjp2-7_2.3.0-2build0.18.04.1_amd64.deb ...
Unpacking libopenjp2-7:amd64 (2.3.0-2build0.18.04.1) ...
Selecting previously unselected package libopus0:amd64.
Preparing to unpack .../11-libopus0_1.1.2-1ubuntu1_amd64.deb ...
Unpacking libopus0:amd64 (1.1.2-1ubuntu1) ...
Selecting previously unselected package libshine3:amd64.
Preparing to unpack .../12-libshine3_3.1.1-1_amd64.deb ...
Unpacking libshine3:amd64 (3.1.1-1) ...
Selecting previously unselected package libsnappy1v5:amd64.
Preparing to unpack .../13-libsnappy1v5_1.1.7-1_amd64.deb ...
Unpacking libsnappy1v5:amd64 (1.1.7-1) ...
Selecting previously unselected package libspeex1:amd64.
Preparing to unpack .../14-libspeex1_1.2~rc1.2-1ubuntu2_amd64.deb ...
Unpacking libspeex1:amd64 (1.2~rc1.2-1ubuntu2) ...
Selecting previously unselected package libsoxr0:amd64.
Preparing to unpack .../15-libsoxr0_0.1.2-3_amd64.deb ...
Unpacking libsoxr0:amd64 (0.1.2-3) ...
Selecting previously unselected package libswresample2:amd64.
Preparing to unpack .../16-libswresample2_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libswresample2:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libtheora0:amd64.
Preparing to unpack .../17-libtheora0_1.1.1+dfsg.1-14_amd64.deb ...
Unpacking libtheora0:amd64 (1.1.1+dfsg.1-14) ...
Selecting previously unselected package libtwolame0:amd64.
Preparing to unpack .../18-libtwolame0_0.3.13-3_amd64.deb ...
Unpacking libtwolame0:amd64 (0.3.13-3) ...
Selecting previously unselected package libvorbis0a:amd64.
Preparing to unpack .../19-libvorbis0a_1.3.5-4.2_amd64.deb ...
Unpacking libvorbis0a:amd64 (1.3.5-4.2) ...
Selecting previously unselected package libvorbisenc2:amd64.
Preparing to unpack .../20-libvorbisenc2_1.3.5-4.2_amd64.deb ...
Unpacking libvorbisenc2:amd64 (1.3.5-4.2) ...
Selecting previously unselected package libvpx5:amd64.
Preparing to unpack .../21-libvpx5_1.7.0-3ubuntu0.18.04.1_amd64.deb ...
Unpacking libvpx5:amd64 (1.7.0-3ubuntu0.18.04.1) ...
Selecting previously unselected package libwavpack1:amd64.
Preparing to unpack .../22-libwavpack1_5.1.0-2ubuntu1.4_amd64.deb ...
Unpacking libwavpack1:amd64 (5.1.0-2ubuntu1.4) ...
Selecting previously unselected package libwebpmux3:amd64.
Preparing to unpack .../23-libwebpmux3_0.6.1-2_amd64.deb ...
Unpacking libwebpmux3:amd64 (0.6.1-2) ...
Selecting previously unselected package libx264-152:amd64.
Preparing to unpack .../24-libx264-152_2%3a0.152.2854+gite9a5903-2_amd64.deb ...
Unpacking libx264-152:amd64 (2:0.152.2854+gite9a5903-2) ...
Selecting previously unselected package libx265-146:amd64.
Preparing to unpack .../25-libx265-146_2.6-3_amd64.deb ...
Unpacking libx265-146:amd64 (2.6-3) ...
Selecting previously unselected package libxvidcore4:amd64.
Preparing to unpack .../26-libxvidcore4_2%3a1.3.5-1_amd64.deb ...
Unpacking libxvidcore4:amd64 (2:1.3.5-1) ...
Selecting previously unselected package libzvbi-common.
Preparing to unpack .../27-libzvbi-common_0.2.35-13_all.deb ...
Unpacking libzvbi-common (0.2.35-13) ...
Selecting previously unselected package libzvbi0:amd64.
Preparing to unpack .../28-libzvbi0_0.2.35-13_amd64.deb ...
Unpacking libzvbi0:amd64 (0.2.35-13) ...
Selecting previously unselected package libavcodec57:amd64.
Preparing to unpack .../29-libavcodec57_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libavcodec57:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libraw1394-11:amd64.
Preparing to unpack .../30-libraw1394-11_2.1.2-1_amd64.deb ...
Unpacking libraw1394-11:amd64 (2.1.2-1) ...
Selecting previously unselected package libavc1394-0:amd64.
Preparing to unpack .../31-libavc1394-0_0.5.4-4build1_amd64.deb ...
Unpacking libavc1394-0:amd64 (0.5.4-4build1) ...
Selecting previously unselected package libass9:amd64.
Preparing to unpack .../32-libass9_1%3a0.14.0-1_amd64.deb ...
Unpacking libass9:amd64 (1:0.14.0-1) ...
Selecting previously unselected package libbluray2:amd64.
Preparing to unpack .../33-libbluray2_1%3a1.0.2-3_amd64.deb ...
Unpacking libbluray2:amd64 (1:1.0.2-3) ...
Selecting previously unselected package libchromaprint1:amd64.
Preparing to unpack .../34-libchromaprint1_1.4.3-1_amd64.deb ...
Unpacking libchromaprint1:amd64 (1.4.3-1) ...
Selecting previously unselected package libgme0:amd64.
Preparing to unpack .../35-libgme0_0.6.2-1_amd64.deb ...
Unpacking libgme0:amd64 (0.6.2-1) ...
Selecting previously unselected package libmpg123-0:amd64.
Preparing to unpack .../36-libmpg123-0_1.25.10-1_amd64.deb ...
Unpacking libmpg123-0:amd64 (1.25.10-1) ...
Selecting previously unselected package libvorbisfile3:amd64.
Preparing to unpack .../37-libvorbisfile3_1.3.5-4.2_amd64.deb ...
Unpacking libvorbisfile3:amd64 (1.3.5-4.2) ...
Selecting previously unselected package libopenmpt0:amd64.
Preparing to unpack .../38-libopenmpt0_0.3.6-1_amd64.deb ...
Unpacking libopenmpt0:amd64 (0.3.6-1) ...
Selecting previously unselected package libssh-gcrypt-4:amd64.
Preparing to unpack .../39-libssh-gcrypt-4_0.8.0~20170825.94fa1e38-1ubuntu0.6_amd64.deb ...
Unpacking libssh-gcrypt-4:amd64 (0.8.0~20170825.94fa1e38-1ubuntu0.6) ...
Selecting previously unselected package libavformat57:amd64.
Preparing to unpack .../40-libavformat57_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libavformat57:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libavresample3:amd64.
Preparing to unpack .../41-libavresample3_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libavresample3:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libbs2b0:amd64.
Preparing to unpack .../42-libbs2b0_3.1.0+dfsg-2.2_amd64.deb ...
Unpacking libbs2b0:amd64 (3.1.0+dfsg-2.2) ...
Selecting previously unselected package libflite1:amd64.
Preparing to unpack .../43-libflite1_2.1-release-1_amd64.deb ...
Unpacking libflite1:amd64 (2.1-release-1) ...
Selecting previously unselected package libmysofa0:amd64.
Preparing to unpack .../44-libmysofa0_0.6~dfsg0-2ubuntu0.18.04.1_amd64.deb ...
Unpacking libmysofa0:amd64 (0.6~dfsg0-2ubuntu0.18.04.1) ...
Selecting previously unselected package libpostproc54:amd64.
Preparing to unpack .../45-libpostproc54_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libpostproc54:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libfftw3-double3:amd64.
Preparing to unpack .../46-libfftw3-double3_3.3.7-1_amd64.deb ...
Unpacking libfftw3-double3:amd64 (3.3.7-1) ...
Selecting previously unselected package libsamplerate0:amd64.
Preparing to unpack .../47-libsamplerate0_0.1.9-1_amd64.deb ...
Unpacking libsamplerate0:amd64 (0.1.9-1) ...
Selecting previously unselected package librubberband2:amd64.
Preparing to unpack .../48-librubberband2_1.8.1-7ubuntu2_amd64.deb ...
Unpacking librubberband2:amd64 (1.8.1-7ubuntu2) ...
Selecting previously unselected package libswscale4:amd64.
Preparing to unpack .../49-libswscale4_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libswscale4:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libnorm1:amd64.
Preparing to unpack .../50-libnorm1_1.5r6+dfsg1-6_amd64.deb ...
Unpacking libnorm1:amd64 (1.5r6+dfsg1-6) ...
Selecting previously unselected package libpgm-5.2-0:amd64.
Preparing to unpack .../51-libpgm-5.2-0_5.2.122~dfsg-2_amd64.deb ...
Unpacking libpgm-5.2-0:amd64 (5.2.122~dfsg-2) ...
Selecting previously unselected package libsodium23:amd64.
Preparing to unpack .../52-libsodium23_1.0.16-2_amd64.deb ...
Unpacking libsodium23:amd64 (1.0.16-2) ...
Selecting previously unselected package libzmq5:amd64.
Preparing to unpack .../53-libzmq5_4.2.5-1ubuntu0.2_amd64.deb ...
Unpacking libzmq5:amd64 (4.2.5-1ubuntu0.2) ...
Selecting previously unselected package libavfilter6:amd64.
Preparing to unpack .../54-libavfilter6_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libavfilter6:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libcaca0:amd64.
Preparing to unpack .../55-libcaca0_0.99.beta19-2ubuntu0.18.04.1_amd64.deb ...
Unpacking libcaca0:amd64 (0.99.beta19-2ubuntu0.18.04.1) ...
Selecting previously unselected package libcdio17:amd64.
Preparing to unpack .../56-libcdio17_1.0.0-2ubuntu2_amd64.deb ...
Unpacking libcdio17:amd64 (1.0.0-2ubuntu2) ...
Selecting previously unselected package libcdio-cdda2:amd64.
Preparing to unpack .../57-libcdio-cdda2_10.2+0.94+2-2build1_amd64.deb ...
Unpacking libcdio-cdda2:amd64 (10.2+0.94+2-2build1) ...
Selecting previously unselected package libcdio-paranoia2:amd64.
Preparing to unpack .../58-libcdio-paranoia2_10.2+0.94+2-2build1_amd64.deb ...
Unpacking libcdio-paranoia2:amd64 (10.2+0.94+2-2build1) ...
Selecting previously unselected package libdc1394-22:amd64.
Preparing to unpack .../59-libdc1394-22_2.2.5-1_amd64.deb ...
Unpacking libdc1394-22:amd64 (2.2.5-1) ...
Selecting previously unselected package libiec61883-0:amd64.
Preparing to unpack .../60-libiec61883-0_1.2.0-2_amd64.deb ...
Unpacking libiec61883-0:amd64 (1.2.0-2) ...
Selecting previously unselected package libjack-jackd2-0:amd64.
Preparing to unpack .../61-libjack-jackd2-0_1.9.12~dfsg-2_amd64.deb ...
Unpacking libjack-jackd2-0:amd64 (1.9.12~dfsg-2) ...
Selecting previously unselected package libopenal-data.
Preparing to unpack .../62-libopenal-data_1%3a1.18.2-2_all.deb ...
Unpacking libopenal-data (1:1.18.2-2) ...
Selecting previously unselected package libsndio6.1:amd64.
Preparing to unpack .../63-libsndio6.1_1.1.0-3_amd64.deb ...
Unpacking libsndio6.1:amd64 (1.1.0-3) ...
Selecting previously unselected package libopenal1:amd64.
Preparing to unpack .../64-libopenal1_1%3a1.18.2-2_amd64.deb ...
Unpacking libopenal1:amd64 (1:1.18.2-2) ...
Selecting previously unselected package libasyncns0:amd64.
Preparing to unpack .../65-libasyncns0_0.8-6_amd64.deb ...
Unpacking libasyncns0:amd64 (0.8-6) ...
Selecting previously unselected package libflac8:amd64.
Preparing to unpack .../66-libflac8_1.3.2-1_amd64.deb ...
Unpacking libflac8:amd64 (1.3.2-1) ...
Selecting previously unselected package libsndfile1:amd64.
Preparing to unpack .../67-libsndfile1_1.0.28-4ubuntu0.18.04.1_amd64.deb ...
Unpacking libsndfile1:amd64 (1.0.28-4ubuntu0.18.04.1) ...
Selecting previously unselected package libpulse0:amd64.
Preparing to unpack .../68-libpulse0_1%3a11.1-1ubuntu7.8_amd64.deb ...
Unpacking libpulse0:amd64 (1:11.1-1ubuntu7.8) ...
Selecting previously unselected package libsdl2-2.0-0:amd64.
Preparing to unpack .../69-libsdl2-2.0-0_2.0.8+dfsg1-1ubuntu1.18.04.4_amd64.deb ...
Unpacking libsdl2-2.0-0:amd64 (2.0.8+dfsg1-1ubuntu1.18.04.4) ...
Selecting previously unselected package libxv1:amd64.
Preparing to unpack .../70-libxv1_2%3a1.0.11-1_amd64.deb ...
Unpacking libxv1:amd64 (2:1.0.11-1) ...
Selecting previously unselected package libavdevice57:amd64.
Preparing to unpack .../71-libavdevice57_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking libavdevice57:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package ffmpeg.
Preparing to unpack .../72-ffmpeg_7%3a3.4.6-0ubuntu0.18.04.1_amd64.deb ...
Unpacking ffmpeg (7:3.4.6-0ubuntu0.18.04.1) ...
Selecting previously unselected package libaacs0:amd64.
Preparing to unpack .../73-libaacs0_0.9.0-1_amd64.deb ...
Unpacking libaacs0:amd64 (0.9.0-1) ...
Selecting previously unselected package libbdplus0:amd64.
Preparing to unpack .../74-libbdplus0_0.1.2-2_amd64.deb ...
Unpacking libbdplus0:amd64 (0.1.2-2) ...
Selecting previously unselected package mesa-va-drivers:amd64.
Preparing to unpack .../75-mesa-va-drivers_19.2.8-0ubuntu0~18.04.3_amd64.deb ...
Unpacking mesa-va-drivers:amd64 (19.2.8-0ubuntu0~18.04.3) ...
Selecting previously unselected package i965-va-driver:amd64.
Preparing to unpack .../76-i965-va-driver_2.1.0-0ubuntu1_amd64.deb ...
Unpacking i965-va-driver:amd64 (2.1.0-0ubuntu1) ...
Selecting previously unselected package va-driver-all:amd64.
Preparing to unpack .../77-va-driver-all_2.1.0-3_amd64.deb ...
Unpacking va-driver-all:amd64 (2.1.0-3) ...
Preparing to unpack .../78-xserver-common_2%3a1.19.6-1ubuntu4.4_all.deb ...
Unpacking xserver-common (2:1.19.6-1ubuntu4.4) over (2:1.19.6-1ubuntu4.3) ...
Selecting previously unselected package xvfb.
Preparing to unpack .../79-xvfb_2%3a1.19.6-1ubuntu4.4_amd64.deb ...
Unpacking xvfb (2:1.19.6-1ubuntu4.4) ...
Setting up libpgm-5.2-0:amd64 (5.2.122~dfsg-2) ...
Setting up libtwolame0:amd64 (0.3.13-3) ...
Setting up libraw1394-11:amd64 (2.1.2-1) ...
Setting up libx264-152:amd64 (2:0.152.2854+gite9a5903-2) ...
Setting up xserver-common (2:1.19.6-1ubuntu4.4) ...
Setting up libopenjp2-7:amd64 (2.3.0-2build0.18.04.1) ...
Setting up libasyncns0:amd64 (0.8-6) ...
Setting up libwavpack1:amd64 (5.1.0-2ubuntu1.4) ...
Setting up xvfb (2:1.19.6-1ubuntu4.4) ...
Setting up libaacs0:amd64 (0.9.0-1) ...
Setting up libnuma1:amd64 (2.0.11-2.1ubuntu0.1) ...
Setting up libflite1:amd64 (2.1-release-1) ...
Setting up libsoxr0:amd64 (0.1.2-3) ...
Setting up libssh-gcrypt-4:amd64 (0.8.0~20170825.94fa1e38-1ubuntu0.6) ...
Setting up libxss1:amd64 (1:1.2.2-1) ...
Setting up libass9:amd64 (1:0.14.0-1) ...
Setting up libbluray2:amd64 (1:1.0.2-3) ...
Setting up libdc1394-22:amd64 (2.2.5-1) ...
Setting up libshine3:amd64 (3.1.1-1) ...
Setting up libva2:amd64 (2.1.0-3) ...
Setting up libiec61883-0:amd64 (1.2.0-2) ...
Setting up libspeex1:amd64 (1.2~rc1.2-1ubuntu2) ...
Setting up libfftw3-double3:amd64 (3.3.7-1) ...
Setting up libxvidcore4:amd64 (2:1.3.5-1) ...
Setting up libopus0:amd64 (1.1.2-1ubuntu1) ...
Setting up libx265-146:amd64 (2.6-3) ...
Setting up libopenal-data (1:1.18.2-2) ...
Setting up libbs2b0:amd64 (3.1.0+dfsg-2.2) ...
Setting up libnorm1:amd64 (1.5r6+dfsg1-6) ...
Setting up libogg0:amd64 (1.3.2-1) ...
Setting up i965-va-driver:amd64 (2.1.0-0ubuntu1) ...
Setting up libsodium23:amd64 (1.0.16-2) ...
Setting up libmp3lame0:amd64 (3.100-2) ...
Setting up libcrystalhd3:amd64 (1:0.0~git20110715.fdd2f19-12) ...
Setting up libwebpmux3:amd64 (0.6.1-2) ...
Setting up libsnappy1v5:amd64 (1.1.7-1) ...
Setting up mesa-va-drivers:amd64 (19.2.8-0ubuntu0~18.04.3) ...
Setting up libva-drm2:amd64 (2.1.0-3) ...
Setting up libavc1394-0:amd64 (0.5.4-4build1) ...
Setting up libzvbi-common (0.2.35-13) ...
Setting up libxv1:amd64 (2:1.0.11-1) ...
Setting up libvpx5:amd64 (1.7.0-3ubuntu0.18.04.1) ...
Setting up libgme0:amd64 (0.6.2-1) ...
Setting up libbdplus0:amd64 (0.1.2-2) ...
Setting up libzvbi0:amd64 (0.2.35-13) ...
Setting up libva-x11-2:amd64 (2.1.0-3) ...
Setting up libcaca0:amd64 (0.99.beta19-2ubuntu0.18.04.1) ...
Setting up libsamplerate0:amd64 (0.1.9-1) ...
Setting up libsndio6.1:amd64 (1.1.0-3) ...
Setting up libvorbis0a:amd64 (1.3.5-4.2) ...
Setting up libtheora0:amd64 (1.1.1+dfsg.1-14) ...
Setting up libmpg123-0:amd64 (1.25.10-1) ...
Setting up libgsm1:amd64 (1.0.13-4build1) ...
Setting up libmysofa0:amd64 (0.6~dfsg0-2ubuntu0.18.04.1) ...
Setting up libcdio17:amd64 (1.0.0-2ubuntu2) ...
Setting up libvorbisfile3:amd64 (1.3.5-4.2) ...
Setting up libzmq5:amd64 (4.2.5-1ubuntu0.2) ...
Setting up libavutil55:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up libopenmpt0:amd64 (0.3.6-1) ...
Setting up libflac8:amd64 (1.3.2-1) ...
Setting up libcdio-cdda2:amd64 (10.2+0.94+2-2build1) ...
Setting up libswresample2:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up librubberband2:amd64 (1.8.1-7ubuntu2) ...
Setting up libswscale4:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up va-driver-all:amd64 (2.1.0-3) ...
Setting up libcdio-paranoia2:amd64 (10.2+0.94+2-2build1) ...
Setting up libpostproc54:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up libjack-jackd2-0:amd64 (1.9.12~dfsg-2) ...
Setting up libopenal1:amd64 (1:1.18.2-2) ...
Setting up libvorbisenc2:amd64 (1.3.5-4.2) ...
Setting up libavresample3:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up libavcodec57:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up libsndfile1:amd64 (1.0.28-4ubuntu0.18.04.1) ...
Setting up libchromaprint1:amd64 (1.4.3-1) ...
Setting up libpulse0:amd64 (1:11.1-1ubuntu7.8) ...
Setting up libsdl2-2.0-0:amd64 (2.0.8+dfsg1-1ubuntu1.18.04.4) ...
Setting up libavformat57:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up libavfilter6:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up libavdevice57:amd64 (7:3.4.6-0ubuntu0.18.04.1) ...
Setting up ffmpeg (7:3.4.6-0ubuntu0.18.04.1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import base64
import imageio
import IPython
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import PIL.Image
import pyvirtualdisplay

import tensorflow as tf

from tf_agents.agents.reinforce import reinforce_agent
from tf_agents.drivers import dynamic_step_driver
from tf_agents.environments import suite_gym
from tf_agents.environments import tf_py_environment
from tf_agents.eval import metric_utils
from tf_agents.metrics import tf_metrics
from tf_agents.networks import actor_distribution_network
from tf_agents.replay_buffers import tf_uniform_replay_buffer
from tf_agents.trajectories import trajectory
from tf_agents.utils import common

tf.compat.v1.enable_v2_behavior()


# Set up a virtual display for rendering OpenAI gym environments.
display = pyvirtualdisplay.Display(visible=0, size=(1400, 900)).start()

Hyperparameters

env_name = "CartPole-v0" # @param {type:"string"}
num_iterations = 250 # @param {type:"integer"}
collect_episodes_per_iteration = 2 # @param {type:"integer"}
replay_buffer_capacity = 2000 # @param {type:"integer"}

fc_layer_params = (100,)

learning_rate = 1e-3 # @param {type:"number"}
log_interval = 25 # @param {type:"integer"}
num_eval_episodes = 10 # @param {type:"integer"}
eval_interval = 50 # @param {type:"integer"}

Environment

Environments in RL represent the task or problem that we are trying to solve. Standard environments can be easily created in TF-Agents using suites. We have different suites for loading environments from sources such as the OpenAI Gym, Atari, DM Control, etc., given a string environment name.

Now let us load the CartPole environment from the OpenAI Gym suite.

env = suite_gym.load(env_name)

We can render this environment to see how it looks. A free-swinging pole is attached to a cart. The goal is to move the cart right or left in order to keep the pole pointing up.


env.reset()
PIL.Image.fromarray(env.render())

png

The time_step = environment.step(action) statement takes action in the environment. The TimeStep tuple returned contains the environment's next observation and reward for that action. The time_step_spec() and action_spec() methods in the environment return the specifications (types, shapes, bounds) of the time_step and action respectively.

print('Observation Spec:')
print(env.time_step_spec().observation)
print('Action Spec:')
print(env.action_spec())
Observation Spec:
BoundedArraySpec(shape=(4,), dtype=dtype('float32'), name='observation', minimum=[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], maximum=[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38])
Action Spec:
BoundedArraySpec(shape=(), dtype=dtype('int64'), name='action', minimum=0, maximum=1)

So, we see that observation is an array of 4 floats: the position and velocity of the cart, and the angular position and velocity of the pole. Since only two actions are possible (move left or move right), the action_spec is a scalar where 0 means "move left" and 1 means "move right."

time_step = env.reset()
print('Time step:')
print(time_step)

action = np.array(1, dtype=np.int32)

next_time_step = env.step(action)
print('Next time step:')
print(next_time_step)
Time step:
TimeStep(step_type=array(0, dtype=int32), reward=array(0., dtype=float32), discount=array(1., dtype=float32), observation=array([ 0.00674595, -0.02194783,  0.03083151, -0.02604969], dtype=float32))
Next time step:
TimeStep(step_type=array(1, dtype=int32), reward=array(1., dtype=float32), discount=array(1., dtype=float32), observation=array([ 0.00630699,  0.17271872,  0.03031052, -0.3088477 ], dtype=float32))

Usually we create two environments: one for training and one for evaluation. Most environments are written in pure python, but they can be easily converted to TensorFlow using the TFPyEnvironment wrapper. The original environment's API uses numpy arrays, the TFPyEnvironment converts these to/from Tensors for you to more easily interact with TensorFlow policies and agents.

train_py_env = suite_gym.load(env_name)
eval_py_env = suite_gym.load(env_name)

train_env = tf_py_environment.TFPyEnvironment(train_py_env)
eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)

Agent

The algorithm that we use to solve an RL problem is represented as an Agent. In addition to the REINFORCE agent, TF-Agents provides standard implementations of a variety of Agents such as DQN, DDPG, TD3, PPO and SAC.

To create a REINFORCE Agent, we first need an Actor Network that can learn to predict the action given an observation from the environment.

We can easily create an Actor Network using the specs of the observations and actions. We can specify the layers in the network which, in this example, is the fc_layer_params argument set to a tuple of ints representing the sizes of each hidden layer (see the Hyperparameters section above).

actor_net = actor_distribution_network.ActorDistributionNetwork(
    train_env.observation_spec(),
    train_env.action_spec(),
    fc_layer_params=fc_layer_params)

We also need an optimizer to train the network we just created, and a train_step_counter variable to keep track of how many times the network was updated.

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)

train_step_counter = tf.compat.v2.Variable(0)

tf_agent = reinforce_agent.ReinforceAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    actor_network=actor_net,
    optimizer=optimizer,
    normalize_returns=True,
    train_step_counter=train_step_counter)
tf_agent.initialize()

Policies

In TF-Agents, policies represent the standard notion of policies in RL: given a time_step produce an action or a distribution over actions. The main method is policy_step = policy.step(time_step) where policy_step is a named tuple PolicyStep(action, state, info). The policy_step.action is the action to be applied to the environment, state represents the state for stateful (RNN) policies and info may contain auxiliary information such as log probabilities of the actions.

Agents contain two policies: the main policy that is used for evaluation/deployment (agent.policy) and another policy that is used for data collection (agent.collect_policy).

eval_policy = tf_agent.policy
collect_policy = tf_agent.collect_policy

Metrics and Evaluation

The most common metric used to evaluate a policy is the average return. The return is the sum of rewards obtained while running a policy in an environment for an episode, and we usually average this over a few episodes. We can compute the average return metric as follows.


def compute_avg_return(environment, policy, num_episodes=10):

  total_return = 0.0
  for _ in range(num_episodes):

    time_step = environment.reset()
    episode_return = 0.0

    while not time_step.is_last():
      action_step = policy.action(time_step)
      time_step = environment.step(action_step.action)
      episode_return += time_step.reward
    total_return += episode_return

  avg_return = total_return / num_episodes
  return avg_return.numpy()[0]


# Please also see the metrics module for standard implementations of different
# metrics.

Replay Buffer

In order to keep track of the data collected from the environment, we will use the TFUniformReplayBuffer. This replay buffer is constructed using specs describing the tensors that are to be stored, which can be obtained from the agent using tf_agent.collect_data_spec.

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
    data_spec=tf_agent.collect_data_spec,
    batch_size=train_env.batch_size,
    max_length=replay_buffer_capacity)

For most agents, the collect_data_spec is a Trajectory named tuple containing the observation, action, reward etc.

Data Collection

As REINFORCE learns from whole episodes, we define a function to collect an episode using the given data collection policy and save the data (observations, actions, rewards etc.) as trajectories in the replay buffer.



def collect_episode(environment, policy, num_episodes):

  episode_counter = 0
  environment.reset()

  while episode_counter < num_episodes:
    time_step = environment.current_time_step()
    action_step = policy.action(time_step)
    next_time_step = environment.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)

    # Add trajectory to the replay buffer
    replay_buffer.add_batch(traj)

    if traj.is_boundary():
      episode_counter += 1


# This loop is so common in RL, that we provide standard implementations of
# these. For more details see the drivers module.

Training the agent

The training loop involves both collecting data from the environment and optimizing the agent's networks. Along the way, we will occasionally evaluate the agent's policy to see how we are doing.

The following will take ~3 minutes to run.


try:
  %%time
except:
  pass

# (Optional) Optimize by wrapping some of the code in a graph using TF function.
tf_agent.train = common.function(tf_agent.train)

# Reset the train step
tf_agent.train_step_counter.assign(0)

# Evaluate the agent's policy once before training.
avg_return = compute_avg_return(eval_env, tf_agent.policy, num_eval_episodes)
returns = [avg_return]

for _ in range(num_iterations):

  # Collect a few episodes using collect_policy and save to the replay buffer.
  collect_episode(
      train_env, tf_agent.collect_policy, collect_episodes_per_iteration)

  # Use data from the buffer and update the agent's network.
  experience = replay_buffer.gather_all()
  train_loss = tf_agent.train(experience)
  replay_buffer.clear()

  step = tf_agent.train_step_counter.numpy()

  if step % log_interval == 0:
    print('step = {0}: loss = {1}'.format(step, train_loss.loss))

  if step % eval_interval == 0:
    avg_return = compute_avg_return(eval_env, tf_agent.policy, num_eval_episodes)
    print('step = {0}: Average Return = {1}'.format(step, avg_return))
    returns.append(avg_return)
step = 25: loss = 0.039565324783325195
step = 50: loss = 0.13622140884399414
step = 50: Average Return = 50.099998474121094
step = 75: loss = -0.1901378631591797
step = 100: loss = -0.2121896743774414
step = 100: Average Return = 81.0
step = 125: loss = -0.08798933029174805
step = 150: loss = -3.0918540954589844
step = 150: Average Return = 95.4000015258789
step = 175: loss = -1.3781604766845703
step = 200: loss = -0.7357068061828613
step = 200: Average Return = 160.5
step = 225: loss = 2.73903751373291
step = 250: loss = 1.1629419326782227
step = 250: Average Return = 200.0

Visualization

Plots

We can plot return vs global steps to see the performance of our agent. In Cartpole-v0, the environment gives a reward of +1 for every time step the pole stays up, and since the maximum number of steps is 200, the maximum possible return is also 200.



steps = range(0, num_iterations + 1, eval_interval)
plt.plot(steps, returns)
plt.ylabel('Average Return')
plt.xlabel('Step')
plt.ylim(top=250)
(0.5, 250.0)

png

Videos

It is helpful to visualize the performance of an agent by rendering the environment at each step. Before we do that, let us first create a function to embed videos in this colab.

def embed_mp4(filename):
  """Embeds an mp4 file in the notebook."""
  video = open(filename,'rb').read()
  b64 = base64.b64encode(video)
  tag = '''
  <video width="640" height="480" controls>
    <source src="data:video/mp4;base64,{0}" type="video/mp4">
  Your browser does not support the video tag.
  </video>'''.format(b64.decode())

  return IPython.display.HTML(tag)

The following code visualizes the agent's policy for a few episodes:

num_episodes = 3
video_filename = 'imageio.mp4'
with imageio.get_writer(video_filename, fps=60) as video:
  for _ in range(num_episodes):
    time_step = eval_env.reset()
    video.append_data(eval_py_env.render())
    while not time_step.is_last():
      action_step = tf_agent.policy.action(time_step)
      time_step = eval_env.step(action_step.action)
      video.append_data(eval_py_env.render())

embed_mp4(video_filename)
WARNING:root:IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (400, 600) to (400, 608) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to None (risking incompatibility). You may also see a FFMPEG warning concerning speedloss due to data not being aligned.