## Tuesday, August 26, 2014

### GTO Brainteaser #7 -- Solution to second GTO quiz

Today I'll walk through solutions to all the problems in the second GTO quiz.  Overall people did much better on the second quiz than they did on the first quiz, whether that is because it was easier or because you all have a better understanding of GTO than they did prior to taking my first quiz I can't say for sure, but I'm hoping its the latter :)

There were some requests for aggregate stats so I'll present those briefly for both quizzes before I dive into the solutions.

Overall people did much better on these true of false questions then they did on the first quiz where overall people got 48.12% of the questions right which is comparable to randomly guessing.  On the second quiz overall 67.1% percent of answers were correct.  Question 2 was the question that gave people the most trouble with a correct answer rate of about 58%,

### Solutions

Question 1:  "In a game of HUNLHE, player A is out of position and player B is in position. A strategy pair where on some river A checks his entire range and folds 60% of the time to a pot sized bet and B checks back some of his range cannot be a GTO set of strategies."

There is a false perception in the poker community that makes people think that we need to call at least 50% of the time to prevent our opponent from "auto-profiting" by betting any two cards, and that if we somehow fail to do so our opponent would bet his entire range at us.

This is only true when our opponents range is such that his entire bluffing range has 0 EV vs our checking range.  In practice, even on the river this is very rare and on the flop it is virtually never the case.  Because we don't need our opponent to be indifferent between open folding and bluffing, but rather indifferent between checking back and bluffing, it is entirely possible that at the point of indifference the EV of checking back is 20% of the pot (ie the very top of his bluffing range beats our worst air, eg the bottom 20% of our checking range) and thus that it is fine for us to fold 60%.

It is very easy to construct example ranges like this, all you need to do is to give player A a range that is 80% medium strength hands and 20% air, and player B a polarized range, where the weakest hand in B's range still beats A's air.

You can browse such a model example here: http://gtorangebuilder.com/#share_scenarioHash=cd0fbe72dac92c1361cab6aab6dfc56b

In practice these types of situations are very common with real ranges and if you are blindly thinking that you need to defend 50% of the time in these spots you are likely making a major error.  With most of his range your opponent will average winning some small % of the pot.  This is fine and if you are too focused on making sure they actually average 0 chips with their worst hands (particularly on the flop) you'll usually end up being taken to value town by the stronger parts of their range.

Question 2:  "In a game of HUNLHE, player A is out of position and player B is in position. A strategy pair where preflop A folds 60% of the time to a minraise and B folds some of his preflop range cannot be a GTO set of strategies."

This question is completely different from the last one because in this case the EV of folding a hand for player A is in fact -0.5bb regardless of what hand he holds.  Folding and checking are fundamentally different actions.

If Player A was ever folding a hand preflop and Player B was folding to a minraise 60% of the time then Player A could increase his EV by minraising the hand he was folding because even if he always loses his minraise when Player B defends, his EV for raising is .4 * -2bb + .6 * 1bb = -.2bb.  This is greater than the -0.5bb from folding.

Question 3:  "You are playing a game of HUNLHE against an opponent who is not playing GTO and who is playing a fixed strategy on the turn (they are not adapting their strategy based on your style of play) but who will play GTO in every river scenario. You are also playing a non-GTO strategy on the turn (but GTO on the river) in order to exploit this opponent.

You determine that in some turn situation the EV of betting with JdTd is higher than the EV of checking with JdTd. A friend tells you that even though betting JdTd is higher EV in isolation that you can increase your overall strategy EV against your opponent by checking back JdTd to "balance your checking range". Your friend may be correct."

The key concept to understand here is that the idea that GTO play requires you to take the maximally profitable line with every hand in isolation is a local concept that only holds at equilibrium.  If both players are playing non-GTO on the turn and then exploiting each other on the river then it is entirely possible that some action on the turn that reduces our EV against the way our opponent is playing now, would increase our EV overall by strengthening our river ranges enough that our EV with our other hands would increase by more than make up for the EV we might lose with JdTd.

Its pretty easy to find simple examples of this kind of situation, but I'll probably save this topic for its own blog post as it is a reasonably large and important topic.

Question 4:  "You are asked to examine two poker bots (Bot 1 and Bot 2) to determine if they are GTO. You have them play each other in a rake free game of HUNHLE and over a billion hands they are so close to break even that there is no statistical evidence that one is better than the other. You then have them each play a billion hands against two non-GTO players (Fish 1 and Fish 2). You find that Bot 1 beats Fish 1 at a much higher rate than Bot 2 beats Fish 1 but that Bot 2 beats Fish 2 at a much higher rate than Bot 1 beats Fish 2. This proves that at least one of the two bots is not GTO."

It is entirely possible for different GTO strategies to perform differently against different types of fish.  I've explained this in detail in this post.

Question 5:  "Zero-sum 2 player games can have two equilibria where both players have identical EVs in each equilibrium but where one equilibrium is higher variance than the other.."

All equilibrium in zero sum 2 player games have to have the same EV but they can have different variance.

A simple example is consider the game where Player 1 can choose to either play Rock Paper Scissors against Player 2 where the winner pays the loser a dollar or he can choose not to play and they both get 0 dollars.

Player 1 choosing to play RPS and both players playing GTO is an equilibrium with 0 EV for each player and Player 1 choosing not to play RPS is an equilibrium with 0 EV for each player but choosing to play RPS is a higher variance choice than a guaranteed EV of 0.

Some people like to weight payoffs with concave functions called utility functions to model the fact that people tend to avoid variance.

Question 6:  "You are playing a bluffing game against a random opponent that comes from an infinite population of players. Half of the population are regulars, that is thinking adaptive players like you. The other half are fish who play fixed, suboptimal strategies and never adapt. The game is on an anonymous site so you have no idea if your opponent is a regular or a fish. Similarly your opponent has no idea if you are a regular or a fish.

The game works as follows. Both players ante \$50. A coin is determines which player is the bluffer. The bluffer is dealt a card from a 2 card deck, which contains one Ace and one Queen. The caller is dealt a King. The bluffer can look at his card and choose to either bet \$100 or check. The caller can call or fold if bet to, otherwise he must check. At showdown the high card wins.

All regulars know that fish, call 60% of the time as the caller and bluff 25% of the time they are dealt a Queen as the bluffer (they always bet an Ace).

What is the GTO strategy for the regulars given this population dynamic? What is the average profit per round for a regular? If the site were not anonymous and every regular was aware of whether their opponent was a regular or a fish how would that change the average profit per round?"

While a naive approach to this problem will get the right answer in this particular, to be thorough when approaching these types of problems in general remember you always need to "be sure it aint pure" and check for pure strategy solutions.  As we'll see in this case pure strategies aren't relevant.

In reality this problem is extremely similar to the RPS puzzle that I posted in Brainteaser #1, where by adopting a pure strategy in one dimension (never playing scissors) a regular could profit against an unknown opponent who is a fish who always plays rock 50% of the time, so if you haven't look at the problem before, definitely check it out as an example where blindly applying indifference conditions can lead you astray.

It turns out that in this scenario the regulars, by competing to extract EV from one another whenever an exploitative line is taken, completely protect the fish from losing and at equilibrium both fish and regulars will break even in this game.

As a reminder, the solution to the version of this game were there are just two players (instead of a population of fish and regulars) is for the betting player to always bet with an Ace and to bet half the time with a Queen and for the calling player to call half the time.  I'll call this strategy the base GTO solution which is a unique equilibrium to the base bluffing game that I illustrate in my first cardrunners video.

The only way indifference conditions can possibly be satisfied is if on average the population is playing the base GTO strategy, which breaks even against all opponents, so it should come as no surprise that if the regulars are playing a mixed strategy then they cannot possibly profit.  Furthermore, if the average population is playing the unique equilibrium strategy, we don't need to check for pure strategy solutions, because we already know that the base game does not have any.

So we can assume that the solution involves regulars playing mixed strategies both as the bettor and as the caller.  For a refresher on mixed strategies and indifference conditions check out this youtube video.

Regulars must be indifferent between bluffing and checking back a Queen.  Lets call c the frequency with which regulars call.  The EV of checking is 0.  For the EV of bluffing to also be 0 for indifference conditions, a bluff must be called half the time since we lose 100 chips when called and win 100 chips when they fold.  Thus if fish are calling 60%, regulars must call 40% so that on average our opponents call 50%.

Similarly, regulars must be indifferent between calling and folding a King.    Since the EV of folding is 0, the EV of calling and beating a bluff is 200 and the EV of calling and losing to a value bet is -100, this means that on average, when an opponent bets they need to have an Ace two thirds of the time.

Remember that when our opponent bets that gives us information about whether or not they are a fish, as regulars are more likely to bet than fish, so after we observe a bet the probability that our opponent is a regular goes up so we will need to apply conditional probability to account for this.

Fish have an Ace 4/5th of the time and overall are betting 5/8ths of the time.

If regulars bet 75% of the time with a Q then they have an Ace 4/7th of the time and overall are betting 7/8th of the time.

If we observe our opponent bet the probability that they hold an Ace is

(4/7 * 7/8 + 4/5 * 5/8) / (5/8 + 7/8) = 2/3, exactly what we need for indifference between calling and folding.

Thus regulars should bluff 75% of the time when they have a Q.

However, what happens if we actually play this strategy as a regular?  We just determined that with a Q we are indifferent between checking and folding so our EV as the bettor when dealt a Q is 0.

We also just determined that with a K we are indifferent between calling and folding when bet to so our EV with a K when bet to is 0.  Our opponents on average check 25% of the time so our EV with a King over all is 25.

Since on average our opponents call 50% of the time our EV with an A is 150.

If we add these up, our EV is ((150 + 0) / 2 + 25) / 2 = 50.  This is exactly how much we have to ante to play the game, so the strategy breaks even.

The takeaway here is that the anonymity completely protects the fish from being beaten in this simplified game.  Note that a key aspect of this result is that in this simplified bluffing game GTO play break even against fish which is not the case in a full game of poker so this question is not designed to illustrate that something like zoom on bovada is unbeatable in any sense.  It is just designed to show that in general, in a population with fish, regulars will usually end up taking lines that are non-GTO in the opposite direction of the fish which will balance out average play and make it less exploitable.

Its easy to see that if we could identify whether our opponent was a regular or a fish we would be able to profit significantly, (\$4.375 per hand on average).  In this case anonymity completely destroys that edge.