GTORangeBuilder Blog: GTO Brainteaser #1 Solution

Today I'll be going over the solution to the Rock Paper Scissors brain teaser that I posted here: http://blog.gtorangebuilder.com/2014/04/gto-brain-teaser-1-exploitation-and.html and talking a bit about how it relates to poker.

The main problem is solved in the video below, which also has a nice introduction to basic game theory concepts.

How is this relevant to poker?

At its essence, this brainteaser is about how you deal with cases where your opponent is in a bad situation, but they also know that they are in a bad situation, and they know that you know, and so on. These types of subgames come up all the time in poker, for many reasons. Some examples, you 3bet pre-flop and a flop that is clearly very bad for your pre-flop range hits, or you are late in a tournament and have a stack size that severely limits your options, etc.

A common mistake is to try and make too much of your situational advantage in a way that is easily countered, rather than taking a moderately exploitative line that leverages your inherent situational advantage to profit, in way that cannot be countered.

A simple example: say you are on the river versus someone who has taken a line that they would never take with a super strong hand, or air. They have basically turned their cards face up as a strong, one pair hand. Meanwhile, you've been representing either two pair plus or a flush draw, and on the river the flush draw misses. However you are sure that your opponent cannot possibly have a big enough hand to feel great about calling off a shove. Let's say your range is about half air and half bluffs and there is a pot sized bet left in your stack.

As I show in this post, you are at an inherent advantage in this situation and should be able to win about 75% of the pot on average even though your range only has about 50% equity when both players play perfectly. However, the one thing you can do to throw away that advantage is to think, "he never has a strong enough hand to call here, I should always bluff to punish him for building such a big pot with a medium strength hand on the flop and turn."

This is equivalent to trying to play 100% paper to take advantage of your opponent in the rock paper scissors game and is just as easily countered by your opponent always calling. By over-reacting to your opponents situational disadvantage, you can put him back on even footing. If you instead take a measured response and bet the nuts always, and you're air 50%, (such that you have the nuts 2/3rd and air 1/3 when your opponent calls) then you are guaranteed a nice solid profit no matter what your opponent does simply due to the good situation you have gotten yourself into.

Population Dynamics

While the above is a nice story, it is not what made me think of the problem. What actually got me thinking about this issue was something much more specific, playing heads up on sites that use anonymous player names.

Often fishy players sit down at heads up and play super aggro on the first hand (how true this is probably depends on what stakes you play). They're looking for action and they came into the game telling themselves how they're going to be aggressive today and punish their nit opponents.

As a result, when a fishy player 4-bets you on the first hand, especially if they sit down with some weird amount that looks like their entire bankroll, you end up in a situation where you generally want to stack off (go all-in) much lighter (with a worse hand) than you would normally.

However, with anonymous player names, the person who's sitting at the table waiting for an opponent has no way to know if a new player who just sits down is actually a maniac fish, or a strong regular. The person joining the table however, can much more safely assume that anyone sitting at an open table waiting for action is more likely a reg than a fish, particularly if there are a number of open tables each with one player seated.

Thus if you join a table against another reg, and get a big hand on the first hand, you can mimic the play of a fish (be super aggro and use slightly odd bet-sizing) and often induce a light stack off from your opponent.

This actually maps exactly to our Rock Paper Scissors example.

Take the rock paper scissors game above (the main problem, not the bonus), but rather than thinking about it from the perspective of Player 2 as a single person, think about it as follows. Player 1 is a GTO player waiting for an opponent. Player 2 will be chosen randomly, and 50% of the time he will be a fish who always plays rock (call him P2F), and the other 50% of the time he will be a smart opponent (call him P2S). Player 1 can't tell who he is playing against, and both Player 1 and P2S are aware of the population dynamics.

While Player 1 wins money overall, he actually loses money to P2S, who is playing 2/3 scissors, 1/3 paper against 2/3 paper, 1/3 rock. P2S actually averages $33.33 per round in this game. Player 1 basically takes money from the fish, but gives half of it back to P2S.

Furthermore, the existence of P2S is what protects P2F from being exploited as badly as he should.

Population effects are common in Game Theory. They come up whenever you introduce a subset of weak players into a pool of GTO players and then have the GTO players adjust their strategies based on the assumption that they are randomly being paired against a member of the population. This results in a shift of the GTO players' strategies, from the regular equilibrium strategy to something that is moderately exploitative (like our 2/3 paper, 1/3 rock strategy from the brainteaser).

This is an interesting aspect of game theory that definitely has a number of implications in every day poker. Next weeks brainteaser will investigate it in more depth as we look at a problem that is related to this year's CPC (Computer Poker Championship) competition.

Bonus Solution

I'm going to be using a lot of the terminology and techniques that I introduced in the solution video above, so if you haven't watched it already please check it out here before continuing.

To solve the bonus problem we have to start at the end of the game tree and work our way back to the start via Backwards Induction.

The basic idea of backwards induction is simple, we solve the 2nd round, and then solve the first round by assuming that the payoff for going into the second round in various game states is the equilibrium payoff of the 2nd round in isolation. An important and related concept is the notion of a subgame perfect equalibrium.

In this case the 2nd round is very easy to solve. If our opponent did not play rock in the 1st round then they must play it in the 2nd round, and this is known to both players, so they are guaranteed to lose. If they did play rock in the 1st round, then the second round is just a regular round of rock paper scissors, and the equilibrium payoff in the round is 0 for both players (In the nash equilibrium, both players each play 1/3 rock, 1/3 paper, 1/3 scissors) .

We'll also assume that Player 1 should be able to profit from the game.

We can rule out any pure strategy play for Player 1 by noting that if Player 1 plays a pure strategy in round 1, then his opponents best response will guarantee that he loses in round 1, which means at best he will break even overall.

We can also say that any pure strategy for Player 2 would at best earn -$50 per round, which is what they get for always playing rock in round 1 and then playing the nash equilibrium of the regular rock paper scissors sub game in round 2.

So assuming player 2 can average better than -$50 by mixing, both players should play mixed strategies.

We can write out the expected value of our options very quite simply.

EV[P₁] = 50 (r₂ - s₂) + 50 * (1 - r₂)

EV[S₁] = 50 (p₂ - r₂) + 50 * (1 - r₂)

EV[R₁] = 50 (s₂ - p₂) + 50 * (1 - r₂)

Here r₂, p₂, and s₂ are the probabilities with which Player 2 plays rock, paper and scissors respectively, and EV[P₁] is the expected value of player 1 playing paper.

We can note immediately that if we subtract the second round EVs, which are represented by the 50 * (1 - r₂) term from each equation, this makes the equations symmetric in r₂, p₂, s₂. Thus the solution to the indifference equations must be symmetric.

If we note that r₂ + p₂ + s₂ = 1, the only symmetric solution is that they are all 1/3.

So Player 2 plays rock, paper and scissors each 1/3. If we plug those values in to Player 1's EV equations we see that he profits by $33.33 per round.

For player 2, if we note that playing anything other than rock loses us $50 in round 2, our EV equations are:

EV[P₂] = 50 (r₁ - s₁) - 50

EV[S₂] = 50 (p₁ - r₁) - 50

EV[R₂] = 50 (s₁ - p₁) - 0

We know that Player 2 is playing all 3 options so indifference conditions require the EVs are all the same. Furthermore, we know that each of those EVs = $-33.33 since the game is zero-sum.

You can solve these by hand or use something like wolfram alpha, but its easy to check that r₁ = 1/3, p₁ = 2/3, s₁ = 0 solves all 3 equations properly (it makes them all equal to -$33.33).

This means that Player 1 plays the same strategy in both the main problem and the bonus problem, but that Player 2's disadvantage is twice as large in the bonus.

Edit:

Note that because P1 is never actually playing scissors, the indifference condition on EV[s1] need not actually hold. Because of this there can be multiple equilibrium solutions to this game so long as EV[P₁] = EV[R₁] as h7r so helpfully noted in the comments.

The EV for both players is the same regardless of which equilibrium is played because that is true for all zero sum two player games.

4 comments:

k43rApril 8, 2014 at 12:55 AM
That's a nice and clean math.
UnknownApril 11, 2014 at 1:18 PM
This comment has been removed by the author.
MichaelApril 12, 2014 at 3:48 AM
I'm loving these brainteasers, really hope to see more of them.

One thing though that I'm a bit confused about: in the bonus puzzle solution when we solve for player 1's EVs, why is the second round EV represented by 50 * r2? Should it not be 50 * (1 - r2) since we only auto-win the second round if player 2 *doesn't* play rock? It's also inconsistent with one of the following paragraphs saying that, if we plug r2=p2=s2=1/3 in we get an EV of $33.33... which we don't since it simplifies to 50 * 1/3 = $16.67. Seems like an error to me...

Note: Only a member of this blog may post a comment.

Monday, April 7, 2014

GTO Brainteaser #1 Solution

How is this relevant to poker?

Population Dynamics

Bonus Solution

4 comments: