Today we'll take a look at the solution to our second brainteaser, Red or Black. The solution uses the concept of backwards induction which is a core part of game theory. The proof of our answer also uses a related concept, mathematical induction.
While the solution to the problem on its own is very simple, without a thorough understanding of the two induction concepts above it can seem like magic, so I'm going to start by explaining backwards induction and mathematical induction with simple examples and then I'll show the solution at the end. If you just want to know the answer you can of course scroll down :)
Armed with these tools let's take a crack at the actual Red or Black problem.
Let's start with backwards induction. If we found ourselves in a state where there are only 2 cards remaining, what is the best we can do?
Looking at the above we can see that with 2 cards, the probability of us winning is always the number of red cards, R divided by the number of total cards, N. This doesn't prove anything about the 52 card game yet, but it gives us a base case from which to induct.
As shown in our top induction example, our next step is to assume that if we are in a state with N total cards left, R of which are red, then our optimal strategy will have an expected value of R / N. If by assuming this, we can prove that with N+1 cards for any R your expect payoff is R / (N + 1), that would then let us backwards induct through the game tree and collapse the payoff for waiting at any decision node to a direct payoff of R / N.
So let's assume that for all possible numbers of red cards R, with a total of N cards, the probability of winning with optimal play is R / N.
So now imagine we in a state with N + 1 total cards, and some specific number of red cards, R = r. Let's determine the payoff of our optimal strategy, using our assumption above. If we bet immediately our payoff is r / (N + 1).
If we wait and see a card, we will end up in one of two states:
In either case we are in a state with N total cards. Applying our assumption that the probability of winning with optimal play in a state with N total cards is R/N where R is the number of red cards, we can conclude that the probability of winning with optimal play in state 1 is r - 1 / N and in state 2 is r / N.
To compute the overall probability of winning if we wait and see a card, we just need to multiply these two probabilities by the frequency with which they occur.
(r - 1) / N * (r / N + 1) + (r / N) * (N + 1 - r) / (N + 1) =
rN / (N * (N + 1)) =
r / (N + 1)
So by induction our odds of winning with optimal play in a deck with any number of cards is just the number of red cards divided by the total number of cards.
Thus with a standard deck, our odds of winning with optimal play are 26 / 52 = 1/2. And if we win $100 when we win and lose $100 when we lose then the EV of the game is 0. We can easily see that just betting on the first card gives us an EV of $0, so that strategy is co-optimal.
While the solution to the problem on its own is very simple, without a thorough understanding of the two induction concepts above it can seem like magic, so I'm going to start by explaining backwards induction and mathematical induction with simple examples and then I'll show the solution at the end. If you just want to know the answer you can of course scroll down :)
Mathematical Induction
Mathematical induction is a very useful technique for solving problems that are relatively simple when the numbers involved are small, but that get very complex as one of those numbers grows. Lets call the number that is going to grow N. Induction is a 3 step process:
- Show that some statement or formula is true for some base case N = y where y is small.
- Assume that that statement is true for an arbitrary number N = x
- Based on that assumption show that it must be true for N = x + 1.
You then conclude that your statement or formula is true for any natural number greater than or equal to y.
The basic idea is that if you know it's true for a base case, N = y, then assumption #2 is correct when x = y, so by #3 the statement is true for y + 1. But by applying #2 and #3 above again, if it is true for y + 1, then it is true for y + 2, and if it is true for y + 2 then it is true for y + 3, and so on. Thus it is true for any N greater than y.
This technique can often make problems that seem extremely difficult to prove very simple. Let's look at a simple example, suppose I want to prove that:
11n - 4n is divisible by 7for any natural number, n that is greater than or equal to one 1. At first glance, that is not at all obvious. I can check it for a few values of n easily and see that it seems true, but to actually prove it for all n seems hard. With induction it's easy.
- Base case: N = 1. 111 - 41 = 7 and is divisible by 7
- Now assume that for N = x, 11x - 4x is divisible by 7.
- For N = x + 1, 11x + 1 - 4x + 1 = 11 * (11x - 4x) + 7 * 4x. If 11x - 4x is divisible by 7 then 11 times that plus a multiple of 7 is also divisible by 7.
- So the statement is true for all natural numbers n great than or equal to 1.
Backwards Induction
In game theory, backwards Induction involves imagining yourself at each possible final decision node and working backwards. Once you know the optimal action for each final node, you can use that information to determine the second-to-last action, and so forth, eventually arriving at complete solution for every possible decision node. The process works as follows:
That sounds a bit complicated but it is actually quite simple and it is something you do every day at the poker table. Let's look at a simple example from a widely studied game called the centipede game, so named because its decision tree looks like a centipede.
The basic idea of the game is simple. There are two players, a single prize, and 10 rounds. In round 1, the prize is $1 and Player 1 can choose to keep that $1 prize (giving his opponent nothing), which ends the game, or to pass, which then doubles the prize. If Player 1 passes, Player 2 is then faced with the same decision for the $2 prize in round 2. If the prize goes unclaimed in round 10 neither play gets anything. The question is what is the optimal strategy? What if instead of 10 rounds there were 100?
This problem is easy to solve with backwards induction.
We first need to figure out the optimal play and expected values at the final decision node. In round 10, the optimal strategy for player 2 is to claim the prize and get the maximum payout of $512 so he always will do so and player 1 will get $0.
Now we can remove round 10 from the decision tree and which makes round 9 a final decision node with direct payoffs. In round 9 Player 1 now has a choice between taking $256 and giving Player 2 $0 or passing and getting $0 himself, so he of course should take the $256. Repeating this process all the way back gives that the optimal strategy is for Player 1 to take the prize and end the game in round 1. You can see that this logic holds no matter how many rounds the game has or how fast the prize grows.
- For each final decision node figure out the optimal decision for any players involved and the payoffs for all players assuming that they all make optimal decisions.
- Remove those final decision nodes from the decision tree and replace them with the payoffs you get by assuming optimal play from that point forward.
- Step 2 will make a whole new set of nodes final decision nodes. Repeat the process for those nodes until there are no remaining decision nodes.
That sounds a bit complicated but it is actually quite simple and it is something you do every day at the poker table. Let's look at a simple example from a widely studied game called the centipede game, so named because its decision tree looks like a centipede.
The basic idea of the game is simple. There are two players, a single prize, and 10 rounds. In round 1, the prize is $1 and Player 1 can choose to keep that $1 prize (giving his opponent nothing), which ends the game, or to pass, which then doubles the prize. If Player 1 passes, Player 2 is then faced with the same decision for the $2 prize in round 2. If the prize goes unclaimed in round 10 neither play gets anything. The question is what is the optimal strategy? What if instead of 10 rounds there were 100?
This problem is easy to solve with backwards induction.
We first need to figure out the optimal play and expected values at the final decision node. In round 10, the optimal strategy for player 2 is to claim the prize and get the maximum payout of $512 so he always will do so and player 1 will get $0.
Now we can remove round 10 from the decision tree and which makes round 9 a final decision node with direct payoffs. In round 9 Player 1 now has a choice between taking $256 and giving Player 2 $0 or passing and getting $0 himself, so he of course should take the $256. Repeating this process all the way back gives that the optimal strategy is for Player 1 to take the prize and end the game in round 1. You can see that this logic holds no matter how many rounds the game has or how fast the prize grows.
Red or Black Solution
Let's start with backwards induction. If we found ourselves in a state where there are only 2 cards remaining, what is the best we can do?
- If there are 0 red cards left, then we are guaranteed to lose.
- If there are 2 red cards left we are guaranteed to win.
- If there is 1 red card left and 1 black card left we have 2 options. We can either bet now, and win 50% of the time, or we can wait until the last card. If we wait until the last card we will be forced to bet, and it will be red 50% of the time and black 50% of the time.
Looking at the above we can see that with 2 cards, the probability of us winning is always the number of red cards, R divided by the number of total cards, N. This doesn't prove anything about the 52 card game yet, but it gives us a base case from which to induct.
As shown in our top induction example, our next step is to assume that if we are in a state with N total cards left, R of which are red, then our optimal strategy will have an expected value of R / N. If by assuming this, we can prove that with N+1 cards for any R your expect payoff is R / (N + 1), that would then let us backwards induct through the game tree and collapse the payoff for waiting at any decision node to a direct payoff of R / N.
So let's assume that for all possible numbers of red cards R, with a total of N cards, the probability of winning with optimal play is R / N.
So now imagine we in a state with N + 1 total cards, and some specific number of red cards, R = r. Let's determine the payoff of our optimal strategy, using our assumption above. If we bet immediately our payoff is r / (N + 1).
If we wait and see a card, we will end up in one of two states:
- A red card will be dealt, leaving us with r - 1 red cards and N total cards. The probability of a red card being deal is r / (N + 1).
- A black card will be dealt, leaving us with r red cards and N total cards. The probability of a black card being dealt is (N + 1 - r) / (N + 1).
In either case we are in a state with N total cards. Applying our assumption that the probability of winning with optimal play in a state with N total cards is R/N where R is the number of red cards, we can conclude that the probability of winning with optimal play in state 1 is r - 1 / N and in state 2 is r / N.
To compute the overall probability of winning if we wait and see a card, we just need to multiply these two probabilities by the frequency with which they occur.
(r - 1) / N * (r / N + 1) + (r / N) * (N + 1 - r) / (N + 1) =
rN / (N * (N + 1)) =
r / (N + 1)
So by induction our odds of winning with optimal play in a deck with any number of cards is just the number of red cards divided by the total number of cards.
Thus with a standard deck, our odds of winning with optimal play are 26 / 52 = 1/2. And if we win $100 when we win and lose $100 when we lose then the EV of the game is 0. We can easily see that just betting on the first card gives us an EV of $0, so that strategy is co-optimal.
An Alternate Solution
The goal of this problem was to give a nice introduction into the key game theory concepts of induction and backwards induction, but I definitely want to mention that reddit user Leet_Noob posted an very elegant solution to the problem that I was unaware of that requires neither.
In fact, it requires no mathematical calculations whatsoever and you can read it here. However, the logic he uses is very specific to this particular problem and won't teach you techniques that are widely applicable to other game theory problems so I'm not going to focus on it here. Furthermore, the type of argument it uses is extremely easy to misapply if you are not quite experienced with probability, so it actually requires some careful thought to verify that his solution is correct.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.