View on GitHub

Dan Brehmer

Easy21 - A Reinforcement Learning Exercise

Solutions to a Simple but Non-Trivial Problem

The video recordings of David Silver’s Reinforcement Learning course are a great place to start if you want to understand Reinforcement Learning (RL), especially if you already have some coding experience. There is one assignment given out about half way through the course to which all RL approaches are applied. This is my write up of solutions to the assignment. Please refer to the course webpage for links to the videos, slides and assignment.

One thing that makes this a good exercise is that it can be solved without using Reinforcement Learning. In the assignment it is suggested to use Monte Carlo Control to determine the optimal value function and policy for Easy21. When I tried this the results did not seem right and even with a large number of iterations the results remained rather noisy. I wanted more confidence in the optimal value function and policy, so I computed it without using RL (see below.) The method I used was to sample the outcome of playing out hands under different policies. It may also be possible to calculate this analytically, but it wasn’t necessary to get a reliable answer.

The Goals of This Exercise

The goal of this assignment is to apply reinforcement learning methods to a simple card game that we call Easy21. This exercise is similar to the Blackjack example in Sutton and Barto 5.3 – please note, however, that the rules of the card game are different and non-standard.

SARSA(lambda) Algorithm for On-Policy Learning

Let’s think about this

The Rest of the Write Up is in this Jupyter Notebook.