Saturday, March 29, 2014

GTO play does more than “break even” -- dispelling the myth

A common misconception surrounding GTO poker is the idea that playing an optimal strategy will somehow break even against weak players.  This idea is generally rooted in the fact that the simplest example of a Nash Equilibrium that most people understand comes from Rock Paper Scissors (RPS). However, RPS is in no way representative of all games, and there are actually games where the Nash Equilibrium strategy also wins absolute maximum EV possible off of weak players 100% of the time.

No Limit Hold'em falls somewhere in-between these two extremes.  We don't have the data or computational ability to conclusively determine exactly how GTO performs against sub-optimal players in general, but we can analyze specific scenarios and make some educated guesses from there. In this post I am going to give some theoretical background into the performance of GTO vs. sub-optimal players and then walk through some real world poker examples with numbers and give some strong evidence that in many situations, GTO does not break even and actually extracts quite a bit of value from fishy players.

Furthermore, I will look at how big the exploitative leaks that you open up in your own game are when you deviate from GTO play.  In general, playing maximally exploitatively vs. a player with a known leak, who is competent enough to notice your exploitation and adjust is risky, but at the same time, playing GTO vs. a player with a known leak is often leaving money on the table.

In a future post I will outline an approach that leverages GTO mathematics to play "unexploitably exploitative" poker against a player with known tendencies, which I believe offers the best of both worlds, but this post is just going to focus on numerically comparing pure GTO play with maximally exploitative play against weak players.

It's important to note that the analysis below only applies to 2-handed situations.  As I explain in this post, there is no guarantee that GTO poker won't lose in 3-way situations with fish and I didn't do any additional in-depth analysis of exactly how well GTO performs in practice 3-way vs. fish.

Rock Paper Scissors (RPS)


RPS is great first game to look at when trying to learn and understand Nash equilibrium, because it's a game everyone knows and the equilibrium is simple to see and understand.  While there is a lot to learn from RPS, extrapolating results from RPS to all games needs to be done with care.

In RPS, the optimal strategy is to play rock 1/3rd of the time, paper 1/3rd of the time and scissors 1/3rd of the time so as to make it impossible for your opponent to exploit you.  It is easy to see that this is a Nash equilibrium because the EV of any strategy against the equilibrium strategy is 0, so there is no incentive for a player to deviate, if he knows his opponent is playing GTO.

In the case of RPS, the equilibrium strategy has EV 0 against a weak player who plays rock 50% of the time and paper and scissors each 25% of the time, while the maximally exploitative strategy of always playing paper would win 50%, lose 25% and tie 25%.

Does this mean that all equilibrium strategies in all games break even against weak opponents?  Of course not.  The issue with Rock Paper Scissors is that there is no room to make plays that are inherently mistakes, so even a very weak player can't do anything fundamentally wrong, the only mistake he can make is to be too predictable.  Rock Paper and Scissors are all equally valid choices in all cases.  Furthermore, there is only a single round of action, so there is no way for subtle mistakes to propagate through a decision tree and have surprising consequences.

This is very different from poker (and most games) where there are plays that are just fundamentally worse than other players. Strategies like calling with a weaker hand and folding a better hand (when there are no relevant card removal effects) just don't make sense.  Nor does paying too much to chase a draw relative to how often it hits and the payoff of it hitting, or folding to a shove when you usually have the best hand, etc.  Poker has many, many less obvious fundamentally weak plays that even elite players make, from betting a merged range into a polarized one to semi-bluffing with a range that doesn't optimize your nuts to air ratio on all possible runouts.

As we'll see in the next example, games where there are fundamentally poor decisions tend to allow GTO strategies to beat weak players, often quite badly.

Second Price Auction


A second price auction is a simple example of a game where the equilibrium strategy wins the absolute maximum possible against any opponent who makes mistakes.

Consider a two player game where $1 is being auctioned off in an auction where the highest bidder wins, but pays the price of the second highest bidders bid.  The equilibrium strategy is to bid $1 as bidding any other amount allows your opponent to bid just slightly more than you and make a profit every time.

However, if a novice player who does not know the equilibrium strategy plays this game against an equilibrium player and always bids $0.30 the equilibrium player will win $0.70, which is the absolute maximum possible amount to win against this fish.

How well an equilibrium strategy performs against a player who makes mistakes depends greatly on the structure of the game, and the specific equilibrium strategy.  For any given game, there can be multiple equilibria, that perform differently against weak opponents.

GTO in NLHE


So how does GTO play fair against weak players in No Limit Hold’em?  Unfortunately answering that question in general isn’t possible, but by thinking about why the RPS equilibrium performs so poorly, why the second price auction equilibrium performs so well, and by analyzing some example poker scenarios we can come to some pretty strong conclusions that GTO poker will beat fishy players who make a lot of mistakes.

An equilibrium strategy in a zero-sum game like poker maximizes the payoff against an opponent who always makes the correct decision, by definition.  What this generally means is that strategies that are designed to “trick” an opponent into calling a bet when you obviously have the nuts or that take advantage of an opponent who has specific known weaknesses but that would backfire and cost us money if the opponent did not actually have those known weaknesses cannot be equilibrium strategies.

However, strategies that are just strong solid play like playing strong hands preflop and raising them aggressively, bluffing missed draws aggressively on the river in conjunction with aggressively value betting strong made hands, not folding to too many continuation bets or preflop 3bets, correctly assessing the equity of your hand vs. your opponents range in all cases, not bluffing when your opponent is never folding, etc are likely all parts of any equilibrium strategy in poker.

Shove or Fold


So how can we quantify how well an Equilibrium strategy performs against weak players?  In the past the only real options were to look at the performance of shove/fold strategies late in SnGs against players who fold too much, call too much, or against players who incorrectly weight hands (eg. they call with 87s but fold K3o against an opponent who is shoving 40% of hands).

First I'll show a very simple example that involves looking at a push/fold equilibrium that has been well understood for years.  I started with the following situation:
  1. Heads up NLHE
  2. Blinds are 0.5/1
  3. Stacks are 15
  4. The small blind must either go all-in or fold.
  5. The big blind can then call or fold.
In this game a GTO strategy for the big blind calls 28% of the time.

I then considered four strategies for the player in the big blind, the GTO strategy, and the following three weaker strategies:

  1. Small Fish: calls about 10% too often (~31% instead of ~28%)
  2. Medium Fish: calls about 25% too often (~35% instead of ~28%)
  3. Huge Fish: calls about 100% too often (~56% instead of 28%)
I calculated the EV for the small blind player of playing the GTO strategy against each of those four strategies and compared it to the EV of playing the maximally exploitative strategy.  The maximally exploitative strategy is defined as the strategy that extracts the most value from your opponent, assuming you know exactly how he plays every single hand.  I then normalized the win-rates by subtracting the GTO vs. GTO win-rate as that represents the inherent positional advantage that the big blind player has in this scenario.

Finally I considered how much an opponent who noticed that you were exploiting them and decided to exploit you back in turn could win as this represents the potential leak you are introducing into your game by trying to play exploitative if your opponent reacts properly to your strategy.

You can see the results in the table below.  Note that the software I used to run this analysis (CREV) rounds to the nearest bb/100 so the results are only that accurate.


GTO bb/100
Max Expl. bb/100
GTO WR %
Expl. Leak
vs GTO
0
0
100%
0
vs Small Fish
1
1
100%
-1
vs Med. Fish
3
4
75%
-3
vs Huge Fish
22
27
82%
-2


As you can see, against all player types, the GTO strategy extracts the significant majority of the value that can possible be extracted from a fishy opponent in this scenario.  Relative to exploitative play, the GTO strategy performs the worst against the medium fish, but still achieves 3/4ths of the EV of the maximum possible win-rate.  

Against the medium fish, the maximally exploitative strategy also opens itself up to a -3bb/100 leak, so if just one out of every four such opponents that you tried to exploit adjusted properly to your strategy (or if all of them made minor adjustments that weren't quite right) you'd no longer outperform a GTO player.  Furthermore, if you just incorrectly estimated the exact way in wish the medium fish played (say you thought the player was a medium fish, but he was actually a small fish) you would also no longer outperform the GTO strategy.

River Scenarios


The analysis above should already make one highly skeptical of the idea that for some reason GTO poker is likely to just break even vs. fishy players, but it is limited to very specific scenario that ignores the vast majority of the types of decisions that poker players actually make in deep stacked poker.

This is where we can leverage GTORangeBuilder to dig a bit deeper.  Let's imagine a hypothetical river situation based on a hand that went as follows:

A weak, tight-passive players raises to 3bb from MP and a passive player calls the small blind while an aggressive player over-calls the big blind.  Effective stack sizes are 121bb.

The flop comes 8h7h3c and the pot is 9bb.
MP bets 6bb, Button folds, and the BB raises to 18bb, MP calls

The turn is Ts and the pot is 45bb
BB bets 30bb and MP calls.

The river comes 4s and the pot is 105bb, and the effective remaining stacks are 70bb.

Now we're just trying to look at a simple example, so the exact hand ranges don't matter too much here, but I went ahead and gave the two players the following ranges:

MP: JJ+, AhJh+, JhTh, QhJh, KhQh.
BB: 77-TT, 33, QhTh+, KhTh+, Ah2h-AhJh, 65s, 87s, T9s, JhTh

This is designed to model a scenario where MP is quite tight preflop but very stubborn postflop, while the BB is an aggressive player who likes to bet sets and draws aggressively.  We'll apply these tendencies to the river later.

First let's take a look at GTO play on the river, assuming that the players should either shove or fold.  The river is a sub-game, so even if non-GTO strategies were used to reach this point in the hand, we can still analyze perfect play from this point forwards.  As you can see in the GTORangeBuilder solution browser below, in this scenario, with GTO river play, the MP player will average getting 32.54bb of the 105bb pot, while the BB player will average winning 72.46bb.


I took this solution and put it into CREV, so that I could look at how these EVs change when the tight-passive MP player plays poorly.

I am happy to share all the CREV files for this scenario with anyone who is interested, just post in the comments.

I then considered two types of fishy MP players.  The first is the super fish, he is ultra passive, and checks all his hands when checked to, and calls 100% when shoved at.  I then considered a medium fishy player who checks back his entire range when checked to, but only calls the river with QQ+.  I used the same calculation as before to normalize the GTO vs. GTO payoff to zero.  The results are below.


GTO bb/h
Max Expl. bb/h
GTO WR %
Expl. Leak
vs Nash
0
0
100%
0
vs Med. Fish
9
20
45%
-21
vs Huge Fish
1.5
6.3
24%
-21
Obviously there are other types of fish (eg. a fish that folds 100%, or an over-aggro fish) that can be exploited more of less heavily (folding too much is the easiest leak to exploit postflop and GTO play does worse relative to maximally exploitative than the example above, against over-aggro it varies) so this doesn't hit all the cases but it is enough to give us a rough sense of how well GTO performs against fish, and as one might expect the answer is significantly better than break even, but significantly worse than the maximally exploitative.

It is worth mentioning that in post-flop scenarios, the maximally exploitative strategies are generally quite complicated and extremely opponent dependent so actually identifying and applying them would require you to know almost exactly the types of mistakes your opponent made at each stage (ie. exactly which hands he calls to much with on every possible board and the exact percentages with which he calls with them).  The GTO strategy is a bit more practical to apply because it doesn't vary with your opponent.

Furthermore, the size of the leak that you are introducing into your own game by playing maximally exploitatively on the river is generally going to be very large.  This means that if your opponent properly adjusts to your strategy, or if you just are wrong about the exact details of the leak in your opponents game you can potentially lose a great deal.

Conclusions


Exactly how well Nash Equilibrium strategies perform against sub-optimal opponents in NLHE depends on the exact situation and the exact way in which our opponent play's sub-optimally.  The goal of this post is not to claim to GTO play is likely to extract the maximum from a sub-optimal opponent, but rather to dispel the myth that GTO play would just break even.  The reality lies somewhere in between.

In practice, I believe the best strategy is to start with something close to GTO and shift your strategy in the general direction of your opponent's weaknesses rather than trying to drastically changing your strategy to maximally exploitative every time you think you have a read on your opponents.  This gives you the benefit of solid un-exploitable play as you develop your reads, combined with the payoff of moderately exploitative play as you develop reads on your opponents weaknesses.  Furthermore, in a future post I'll demonstrate some specific mathematical methods that can be used to figure out exactly how one should shift from GTO based on general tendencies that exploit those tendencies while minimizing the potential for counter-exploitation by your opponent.

We also always need to be aware of the fact that the more a player deviates from GTO play, the bigger the leaks he opens up in his own game are, and if you either misjudge the nature of your opponents weakness, or don't adjust for his reaction to your prior adjustments, then you can easily end up losing money.  Against a player whom you believe to be better than you, just sticking to GTO is the safest play.

As always questions and comments are welcome.  If you have any specific situations you'd like me to analyze further as a follow up just post in the comments.


Update



I'd like to thank BlackLoter for using Hold'em Resources to re-analyze the push/fold equilibrium that I listed above.  Hold'em Resources has a much more precise EV output, and doesn't rely on small monte-carlo samples so it is possible to get a much more exact idea of the outcomes than I was able to get using CREV.  CREV rounds to the nearest bb/100 which meant that when our edge was small, (eg 1bb/100) the rounding had a significant impact, although I don't think it changes any of the high level conclusions above.

Here are his numbers (I haven't verified these myself)



GTO bb/100
Max Expl. bb/100
GTO WR %
Expl. Leak
vs Nash
0
0
100%
0
vs Small Fish
0.39
0.81
48.2%
-1.26
vs Med. Fish
1.77
3.54
50%
-2.82
vs Huge Fish
20
26
76.7%
-2.7
vs Tight 20%
1.47
8.97
16.4%
-38.67
vs Tight 14%
5.07
30.72
16.5%
-62.37

5 comments:

  1. Really interesting read, thank-you.

    ReplyDelete
  2. Very well thought out and explained, thank you!

    ReplyDelete
  3. I improved your analysis in the push/fold scenario using a better program that will enable me to compute the exact gains in each scenario. I supposed the fish will employ the best ranges (ie the best 35% of hands he should call with as opposed to a sub optimal 35% of hands) which means that obviously in reality he may do worse.
    These are the results I got:
    GTO bb/100 Max Expl. bb/100 GTO WR % Expl. Leak
    vs GTO 0 0 100.00% 0
    vs Small Fish 0.39 0.81 48.15% -1.26
    vs Med. Fish 1.77 3.54 50.00% -2.82
    vs Huge fish 21.18 26.04 81.34% -2.64
    vs Tight 20% 1.47 8.82 16.67% -27.63
    vs vTight 14% 5.07 28.14 18.02% -41.79

    ReplyDelete
    Replies
    1. Awesome, thanks for doing that, what program did you use if I may ask?

      The vs Tight numbers are interesting, especially how giant the exploitative leak is. If its okay with you, I'll edit the post to mention your data.

      Delete
    2. Sure, you can add that data to the post. I can give it a second check to be sure figures are fine.

      Delete

Note: Only a member of this blog may post a comment.