Saturday, March 29, 2014

GTO play does more than “break even” -- dispelling the myth

A common misconception surrounding GTO poker is the idea that playing an optimal strategy will somehow break even against weak players.  This idea is generally rooted in the fact that the simplest example of a Nash Equilibrium that most people understand comes from Rock Paper Scissors (RPS). However, RPS is in no way representative of all games, and there are actually games where the Nash Equilibrium strategy also wins absolute maximum EV possible off of weak players 100% of the time.

No Limit Hold'em falls somewhere in-between these two extremes.  We don't have the data or computational ability to conclusively determine exactly how GTO performs against sub-optimal players in general, but we can analyze specific scenarios and make some educated guesses from there. In this post I am going to give some theoretical background into the performance of GTO vs. sub-optimal players and then walk through some real world poker examples with numbers and give some strong evidence that in many situations, GTO does not break even and actually extracts quite a bit of value from fishy players.

Furthermore, I will look at how big the exploitative leaks that you open up in your own game are when you deviate from GTO play.  In general, playing maximally exploitatively vs. a player with a known leak, who is competent enough to notice your exploitation and adjust is risky, but at the same time, playing GTO vs. a player with a known leak is often leaving money on the table.

In a future post I will outline an approach that leverages GTO mathematics to play "unexploitably exploitative" poker against a player with known tendencies, which I believe offers the best of both worlds, but this post is just going to focus on numerically comparing pure GTO play with maximally exploitative play against weak players.

It's important to note that the analysis below only applies to 2-handed situations.  As I explain in this post, there is no guarantee that GTO poker won't lose in 3-way situations with fish and I didn't do any additional in-depth analysis of exactly how well GTO performs in practice 3-way vs. fish.

Rock Paper Scissors (RPS)


RPS is great first game to look at when trying to learn and understand Nash equilibrium, because it's a game everyone knows and the equilibrium is simple to see and understand.  While there is a lot to learn from RPS, extrapolating results from RPS to all games needs to be done with care.

In RPS, the optimal strategy is to play rock 1/3rd of the time, paper 1/3rd of the time and scissors 1/3rd of the time so as to make it impossible for your opponent to exploit you.  It is easy to see that this is a Nash equilibrium because the EV of any strategy against the equilibrium strategy is 0, so there is no incentive for a player to deviate, if he knows his opponent is playing GTO.

In the case of RPS, the equilibrium strategy has EV 0 against a weak player who plays rock 50% of the time and paper and scissors each 25% of the time, while the maximally exploitative strategy of always playing paper would win 50%, lose 25% and tie 25%.

Does this mean that all equilibrium strategies in all games break even against weak opponents?  Of course not.  The issue with Rock Paper Scissors is that there is no room to make plays that are inherently mistakes, so even a very weak player can't do anything fundamentally wrong, the only mistake he can make is to be too predictable.  Rock Paper and Scissors are all equally valid choices in all cases.  Furthermore, there is only a single round of action, so there is no way for subtle mistakes to propagate through a decision tree and have surprising consequences.

This is very different from poker (and most games) where there are plays that are just fundamentally worse than other players. Strategies like calling with a weaker hand and folding a better hand (when there are no relevant card removal effects) just don't make sense.  Nor does paying too much to chase a draw relative to how often it hits and the payoff of it hitting, or folding to a shove when you usually have the best hand, etc.  Poker has many, many less obvious fundamentally weak plays that even elite players make, from betting a merged range into a polarized one to semi-bluffing with a range that doesn't optimize your nuts to air ratio on all possible runouts.

As we'll see in the next example, games where there are fundamentally poor decisions tend to allow GTO strategies to beat weak players, often quite badly.

Second Price Auction


A second price auction is a simple example of a game where the equilibrium strategy wins the absolute maximum possible against any opponent who makes mistakes.

Consider a two player game where $1 is being auctioned off in an auction where the highest bidder wins, but pays the price of the second highest bidders bid.  The equilibrium strategy is to bid $1 as bidding any other amount allows your opponent to bid just slightly more than you and make a profit every time.

However, if a novice player who does not know the equilibrium strategy plays this game against an equilibrium player and always bids $0.30 the equilibrium player will win $0.70, which is the absolute maximum possible amount to win against this fish.

How well an equilibrium strategy performs against a player who makes mistakes depends greatly on the structure of the game, and the specific equilibrium strategy.  For any given game, there can be multiple equilibria, that perform differently against weak opponents.

GTO in NLHE


So how does GTO play fair against weak players in No Limit Hold’em?  Unfortunately answering that question in general isn’t possible, but by thinking about why the RPS equilibrium performs so poorly, why the second price auction equilibrium performs so well, and by analyzing some example poker scenarios we can come to some pretty strong conclusions that GTO poker will beat fishy players who make a lot of mistakes.

An equilibrium strategy in a zero-sum game like poker maximizes the payoff against an opponent who always makes the correct decision, by definition.  What this generally means is that strategies that are designed to “trick” an opponent into calling a bet when you obviously have the nuts or that take advantage of an opponent who has specific known weaknesses but that would backfire and cost us money if the opponent did not actually have those known weaknesses cannot be equilibrium strategies.

However, strategies that are just strong solid play like playing strong hands preflop and raising them aggressively, bluffing missed draws aggressively on the river in conjunction with aggressively value betting strong made hands, not folding to too many continuation bets or preflop 3bets, correctly assessing the equity of your hand vs. your opponents range in all cases, not bluffing when your opponent is never folding, etc are likely all parts of any equilibrium strategy in poker.

Shove or Fold


So how can we quantify how well an Equilibrium strategy performs against weak players?  In the past the only real options were to look at the performance of shove/fold strategies late in SnGs against players who fold too much, call too much, or against players who incorrectly weight hands (eg. they call with 87s but fold K3o against an opponent who is shoving 40% of hands).

First I'll show a very simple example that involves looking at a push/fold equilibrium that has been well understood for years.  I started with the following situation:
  1. Heads up NLHE
  2. Blinds are 0.5/1
  3. Stacks are 15
  4. The small blind must either go all-in or fold.
  5. The big blind can then call or fold.
In this game a GTO strategy for the big blind calls 28% of the time.

I then considered four strategies for the player in the big blind, the GTO strategy, and the following three weaker strategies:

  1. Small Fish: calls about 10% too often (~31% instead of ~28%)
  2. Medium Fish: calls about 25% too often (~35% instead of ~28%)
  3. Huge Fish: calls about 100% too often (~56% instead of 28%)
I calculated the EV for the small blind player of playing the GTO strategy against each of those four strategies and compared it to the EV of playing the maximally exploitative strategy.  The maximally exploitative strategy is defined as the strategy that extracts the most value from your opponent, assuming you know exactly how he plays every single hand.  I then normalized the win-rates by subtracting the GTO vs. GTO win-rate as that represents the inherent positional advantage that the big blind player has in this scenario.

Finally I considered how much an opponent who noticed that you were exploiting them and decided to exploit you back in turn could win as this represents the potential leak you are introducing into your game by trying to play exploitative if your opponent reacts properly to your strategy.

You can see the results in the table below.  Note that the software I used to run this analysis (CREV) rounds to the nearest bb/100 so the results are only that accurate.


GTO bb/100
Max Expl. bb/100
GTO WR %
Expl. Leak
vs GTO
0
0
100%
0
vs Small Fish
1
1
100%
-1
vs Med. Fish
3
4
75%
-3
vs Huge Fish
22
27
82%
-2


As you can see, against all player types, the GTO strategy extracts the significant majority of the value that can possible be extracted from a fishy opponent in this scenario.  Relative to exploitative play, the GTO strategy performs the worst against the medium fish, but still achieves 3/4ths of the EV of the maximum possible win-rate.  

Against the medium fish, the maximally exploitative strategy also opens itself up to a -3bb/100 leak, so if just one out of every four such opponents that you tried to exploit adjusted properly to your strategy (or if all of them made minor adjustments that weren't quite right) you'd no longer outperform a GTO player.  Furthermore, if you just incorrectly estimated the exact way in wish the medium fish played (say you thought the player was a medium fish, but he was actually a small fish) you would also no longer outperform the GTO strategy.

River Scenarios


The analysis above should already make one highly skeptical of the idea that for some reason GTO poker is likely to just break even vs. fishy players, but it is limited to very specific scenario that ignores the vast majority of the types of decisions that poker players actually make in deep stacked poker.

This is where we can leverage GTORangeBuilder to dig a bit deeper.  Let's imagine a hypothetical river situation based on a hand that went as follows:

A weak, tight-passive players raises to 3bb from MP and a passive player calls the small blind while an aggressive player over-calls the big blind.  Effective stack sizes are 121bb.

The flop comes 8h7h3c and the pot is 9bb.
MP bets 6bb, Button folds, and the BB raises to 18bb, MP calls

The turn is Ts and the pot is 45bb
BB bets 30bb and MP calls.

The river comes 4s and the pot is 105bb, and the effective remaining stacks are 70bb.

Now we're just trying to look at a simple example, so the exact hand ranges don't matter too much here, but I went ahead and gave the two players the following ranges:

MP: JJ+, AhJh+, JhTh, QhJh, KhQh.
BB: 77-TT, 33, QhTh+, KhTh+, Ah2h-AhJh, 65s, 87s, T9s, JhTh

This is designed to model a scenario where MP is quite tight preflop but very stubborn postflop, while the BB is an aggressive player who likes to bet sets and draws aggressively.  We'll apply these tendencies to the river later.

First let's take a look at GTO play on the river, assuming that the players should either shove or fold.  The river is a sub-game, so even if non-GTO strategies were used to reach this point in the hand, we can still analyze perfect play from this point forwards.  As you can see in the GTORangeBuilder solution browser below, in this scenario, with GTO river play, the MP player will average getting 32.54bb of the 105bb pot, while the BB player will average winning 72.46bb.


I took this solution and put it into CREV, so that I could look at how these EVs change when the tight-passive MP player plays poorly.

I am happy to share all the CREV files for this scenario with anyone who is interested, just post in the comments.

I then considered two types of fishy MP players.  The first is the super fish, he is ultra passive, and checks all his hands when checked to, and calls 100% when shoved at.  I then considered a medium fishy player who checks back his entire range when checked to, but only calls the river with QQ+.  I used the same calculation as before to normalize the GTO vs. GTO payoff to zero.  The results are below.


GTO bb/h
Max Expl. bb/h
GTO WR %
Expl. Leak
vs Nash
0
0
100%
0
vs Med. Fish
9
20
45%
-21
vs Huge Fish
1.5
6.3
24%
-21
Obviously there are other types of fish (eg. a fish that folds 100%, or an over-aggro fish) that can be exploited more of less heavily (folding too much is the easiest leak to exploit postflop and GTO play does worse relative to maximally exploitative than the example above, against over-aggro it varies) so this doesn't hit all the cases but it is enough to give us a rough sense of how well GTO performs against fish, and as one might expect the answer is significantly better than break even, but significantly worse than the maximally exploitative.

It is worth mentioning that in post-flop scenarios, the maximally exploitative strategies are generally quite complicated and extremely opponent dependent so actually identifying and applying them would require you to know almost exactly the types of mistakes your opponent made at each stage (ie. exactly which hands he calls to much with on every possible board and the exact percentages with which he calls with them).  The GTO strategy is a bit more practical to apply because it doesn't vary with your opponent.

Furthermore, the size of the leak that you are introducing into your own game by playing maximally exploitatively on the river is generally going to be very large.  This means that if your opponent properly adjusts to your strategy, or if you just are wrong about the exact details of the leak in your opponents game you can potentially lose a great deal.

Conclusions


Exactly how well Nash Equilibrium strategies perform against sub-optimal opponents in NLHE depends on the exact situation and the exact way in which our opponent play's sub-optimally.  The goal of this post is not to claim to GTO play is likely to extract the maximum from a sub-optimal opponent, but rather to dispel the myth that GTO play would just break even.  The reality lies somewhere in between.

In practice, I believe the best strategy is to start with something close to GTO and shift your strategy in the general direction of your opponent's weaknesses rather than trying to drastically changing your strategy to maximally exploitative every time you think you have a read on your opponents.  This gives you the benefit of solid un-exploitable play as you develop your reads, combined with the payoff of moderately exploitative play as you develop reads on your opponents weaknesses.  Furthermore, in a future post I'll demonstrate some specific mathematical methods that can be used to figure out exactly how one should shift from GTO based on general tendencies that exploit those tendencies while minimizing the potential for counter-exploitation by your opponent.

We also always need to be aware of the fact that the more a player deviates from GTO play, the bigger the leaks he opens up in his own game are, and if you either misjudge the nature of your opponents weakness, or don't adjust for his reaction to your prior adjustments, then you can easily end up losing money.  Against a player whom you believe to be better than you, just sticking to GTO is the safest play.

As always questions and comments are welcome.  If you have any specific situations you'd like me to analyze further as a follow up just post in the comments.


Update



I'd like to thank BlackLoter for using Hold'em Resources to re-analyze the push/fold equilibrium that I listed above.  Hold'em Resources has a much more precise EV output, and doesn't rely on small monte-carlo samples so it is possible to get a much more exact idea of the outcomes than I was able to get using CREV.  CREV rounds to the nearest bb/100 which meant that when our edge was small, (eg 1bb/100) the rounding had a significant impact, although I don't think it changes any of the high level conclusions above.

Here are his numbers (I haven't verified these myself)



GTO bb/100
Max Expl. bb/100
GTO WR %
Expl. Leak
vs Nash
0
0
100%
0
vs Small Fish
0.39
0.81
48.2%
-1.26
vs Med. Fish
1.77
3.54
50%
-2.82
vs Huge Fish
20
26
76.7%
-2.7
vs Tight 20%
1.47
8.97
16.4%
-38.67
vs Tight 14%
5.07
30.72
16.5%
-62.37

Friday, March 21, 2014

Range Equity vs Range Balance -- Which matters more?

Often people use the equity of their hand or their range vs. their opponents range as a way to gauge how good or bad a certain situation is for them.  While focusing on equity is somewhat intuitive and convenient (because it is easy to calculate), it completely ignores the impact of your strategic options, or lack thereof, on your odds of winning the hand.

In turns out that even against extremely simple strategies, there are ranges with good equity that cannot possibly win their share of the pot.  Equity assumes that your opponent lets you see all 5 board cards and that they let you show down your hand.  In practice, even weak players will put way too much pressure on you to make this assumption realistic.

Often people think that when they put themselves in situations with capped or unbalanced ranges that the situation is just "tough to play", but that if they were better and able to play perfectly they would be able to defend their hand equity.  As we'll see in the examples below, this simply isn't true.  Unbalanced and particularly capped ranges are just fundamentally unable to defend their equity, even against opponents who play very simple, predictable strategies.  

Range balance is a much more important factor than range equity in determining how much of a pot you are likely to be able to win on average on the river.  The deeper you are, and the better your opponent plays, the more balance matters and the less equity matters.  As stacks get shorter, equity becomes the dominating factor, so for SnG / MTT players it is more reasonable to rely on equity in your analysis.


The simplest example, the nuts/air vs. made hand



I'm going to start by looking at a contrived example that illustrates the basic ideas that prevent unbalanced ranges from defending their equity.  This example is obviously much simpler than any real poker scenario, but it captures the strategic essence of many situations.

We're heads up on the river and player 1 has played in such a way that his hand is face up as a medium strength made hand. Player 2 has taken a very aggressive line that restricts his range only to nuts or air.  Let's assume player 1 is in position.

We'll assume that, on the river, player 2 has the nuts 50% and air 50%.  This means that both players have exactly 50% equity on the river.  Furthermore, assume the pot is 60bb and there is 120bb left behind to bet.  What is each player's EV?

If we just focused on equity, obviously each player would expect to win half of the 60bb pot on average for an EV of 30bb.  However, even an extremely simple and brain dead strategy for player 2 does much better than this.

Suppose player 2 plays an extremely simple (and quite fishy) strategy and just over-bet shoves with the nuts 100% of the time and with air 50% of the time and check folds his air the other 50%.  Player 1 must fold to the shove, as he's losing 2/3rds of the time he calls and 2/3 * -120bb + 1/3 * 180bb = -20bb so calling is clearly -EV.  This simple, fishy strategy lets player 2 win the pot 75% of the time, even though his range only has 50% equity.  Even if player 1 is a much better player, there is nothing he can do against this simple strategy.

When you actually are playing in this situation as player 1, it can feel like a complex leveling battle of, "he knows my hand is face up, so he's going to try and bluff me off it, but I know, that he knows, and he knows that I know that he knows" and you can sit there agonizing over what level your opponent is on.  It's easy to convince yourself that if you could somehow win the leveling war then you could win the pot half the time on this river. 

The reality of the situation is that because your range is unbalanced and your opponents range is not, he has a fundamental advantage over you, choice.  He can take different actions with different portions of his range so that even though his average equity is 50%, he can use the strong portion of his range (the nuts) to protect some of the weak portion of his range (the air) and make them indistinguishable to you, thus increasing his EV.

The advantage here for player 2 is NOT due to his opponents hand being face up, rather it is due to the fact that he knows whether he actually has the nuts or air and his opponent does not.

To see this, take the exact same scenario as above, but force player 2 to act without looking at his cards. From player 1's perspective he still has no idea whether his opponent has the nuts or air.  From player 2's perspective, he still knows exactly what his opponent has, player 1's hand is just as face up.  However, in this case, it is easy for player 1 to win the pot half the time.  When player 2 is playing blind, player 1 can just call any bet no matter the amount (or check) and he will win average winning exactly 50% of the pot, even though player 1's hand is still face up.

When you hold a balanced range (a range with a variety of hand strengths) in any given hand you know which specific portion of your range you actually hold and your opponent does not.  That informational difference about your own holdings can be converted into money, often with very simple strategies as shown above, and your opponent can not prevent that, even with perfect play.

Obviously if a simple, over-bet shove 2x pot or fold strategy lets player 2 win 75% of the pot on average, no matter what his opponent does, the Nash Equilibrium strategy will do even better for him.  As you can see in the video below, in a simple scenario where each player can bet 50% pot or shove all in at each point, player 2 (the hero) has an EV of 50bb in this scenario or about 84.4% of the pot, even though his equity is 50%.  Furthermore, the deeper the stacks, the more player 2's EV increases.  With 1000bb stacks player 2 wins 95% of the pot and as stacks go to infinity, player 2 wins 100% (if anyone is interested in a proof of that, ask in the comments).


Conclusions

Anytime you rely on equity to determine the EV of a decision you are assuming that your opponent is going to let you showdown your hand which, in practice, ignores many of the biggest parts of poker.  How well balanced your range is determines how many strategic options you have available and increases the informational balance you have (by knowing your exact holdings) over your opponent (who is putting you on a hand range).  In lots of post flop situations, the impact of your range balance will outweigh the actual equity strength of your range.

There is a lot more analysis that can be done here, for example, how much does this result change if, rather than a pure nuts or air scenario, we look at two players whose ranges overlap, but where one players in generally more polarized.  Or what about scenarios where both players can have air but one players range is capped?  Can adding air to a range actually increase its EV by opening up more strategic options, etc?  For now, I'm going to leave those questions for a future blog post, but if there are any particular questions that you'd like me to analyze, just let me know :)

Wednesday, March 19, 2014

GTO Poker Outside of Heads Up -- What it solves and what it does not

It seems that you can’t discuss poker strategy these days without hearing the term GTO.  This should come as no surprise given that the promise of Game Theory Optimal poker is that it is completely unbeatable.  Obviously I'm 100% on the GTO bandwagon given the contents of this blog, but like anything, GTO poker has its limitations and people seem to regularly ignore these limitations either out of ignorance or self-interest.  Understanding GTO concepts will help any poker player (yes even microstakes players) improve their game, but it is not a holy grail that solves all of poker and guarantees free money.

The key thing to understand is that for most players, GTO can drastically improve your play in many specific situations. However, outside of heads up it cannot be used as a basis for an entire strategy, and you will be best off leveraging GTO concepts alongside standard play.

GTORangebuilder was built from the ground up with the strengths and limitations of GTO play in mind, and the only situations it analyzes are situations where its advice will guarantee you the stated EV. However, because of this (and because of computational limits), it's limited in what situations it can solve and it is designed to be a tool to aid you in improving your game, not a solution to all of poker.  Anyone claiming to know an unbeatable strategy for what to do in every possible situation in say, a 6-max 100BB cash game, is confused about what GTO means, pure and simple.

Nash equilibria defined:


For an in-depth look at the definition of a Nash equilibrium and how they are used in poker see this post. Today I'm just going to look at the technical definition and highlight exactly what it says.  From wikipedia:

"In game theory, the Nash equilibrium is a solution concept of a non-cooperative game involving two or more players, in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy."

The key word that people often gloss over here is "own".  A player playing a GTO (or Nash Equilibrium strategy, I use the terms interchangeably), is guaranteed that if all the other players are also playing Nash Equilibrium strategies, that no player could unilateral change their strategy and increase their EV.

Lets first apply this definition to heads up where GTO poker does guarantee unbeatable play.  And then we'll look at some three player examples where it all falls apart.

GTO in Heads Up:


Suppose you are playing heads up vs a fish and you somehow are able to play perfect GTO poker, while the fish is not and makes many mistakes.  You can imagine the mistakes the fish makes as him changing his own strategy from GTO to a weaker strategy.  The definition above says he cannot possibly have gained EV by changing his own strategy.  Furthermore, poker is zero-sum and you are the only other player so if he lost EV than you had to have gained that EV.

If you play GTO and your opponent plays anything else in a heads up game they cannot beat you, it is mathematically impossible.  That is a powerful statement and it's what makes GTO so appealing. If perfect GTO play were known (which barring a major breakthrough like advances in quantum computing is unlikely to happen this century for the full NLHE game) it truly would solve heads up forever.

Heads Up Subgames:


The power of GTO play isn't limited to situations where only two players are dealt into the hand.  You can apply the same logic as above to any "subgame" (poker situation), in which only two players are left in the hand.  If only two players see the flop, while the preflop action might determine the starting pot size on the flop as well as the flop hand ranges for both players, from that point on the players are playing a heads up two player subgame in which GTO play is unbeatable.  Thus while you might have put yourself in a bad situation by seeing the flop, a GTO strategy will be able to determine the exact EV of the situation you've put yourself in and if you follow the strategy, no matter what your opponent does you are guaranteed at least that EV.  If they do not play GTO themselves, your EV can only go up.

The vast majority of poker hands are heads up by the river so by applying GTO strategies to heads up situations that come up in your games you can greatly improve your win-rate in all sorts of post-flop situations whether you play 6-max, full ring, or heads up.

GTO 3-handed


EDIT: In my CardRunners Video I redid these calculations more precisely using the Hold'em Resources calculator.  The basic result is the same but the exact numerical values / strategies below are not as exact as they could be, see the video for more precise results.

In 3-handed situations the entire premise of GTO starts to break down, because a decrease in one of our opponents EV does not necessarily mean an increase in ours.  In fact, it is often the case that an opponent who makes mistakes can actually decrease our EV even if we continue to play GTO.  The easiest way to see this is to start with a simple example of a 3 handed push/fold equilibrium in a short stacked scenario.

Suppose we are 3 handed and all players have 15BB and are playing shove/fold poker in a rake free cash game (15BB is a bit to deep for this to be a great idea, but that doesn't matter for the sake of the example). The equilibrium solution for this game is reasonably simple and can be found here.

Basically the button shoves about 29% of his hands, the small blind calls with 14.5%, and the big blind over calls with a very tight range (9.4%) and calls if the small blind folds with a wider range (14.8%).  If the button folds, then the small blind shoves a very wide range (46%) and the big blind calls 28%.

Lets assume the hero is in the small blind.  If you put that scenario into an analysis tool like CardRunnersEV (I ran CREV with a 1 million hand monte-carlo sample which is pretty good but not perfectly accurate, particlarly because CREV rounds to the nearest 1BB/100) you can easily see the expected value for each player when playing the Nash Equilibrium strategies.  They are:

Button EV:  19 bb / 100
(HERO) Small Blind EV:  -11 bb / 100
Big Blind EV:  -8 bb / 100

If all 3 players play GTO, on average each player will win 19bb / 100 in the button, lose 11bb / 100 in the small blind, and lose 8bb / 100 in the big blind, netting to 0bb / 100 break even play.

Now we know from the Nash definition that if any player starts from the Nash state (where all 3 players are playing Nash) and changes only his own strategy, that he will reduce his EV.  Lets assume that the button is a weak tight player and does not shove nearly enough.  We know this has to decrease the buttons EV, but nothing about the definition of a Nash Equilibrium guarantees that the button's change in strategy won't also decrease the hero's ev.

If the button only shoves: 55+, AJ+, KQ, KJs, QJs, JTs the EVs become:

Button EV:  15 bb / 100
(HERO) Small Blind EV:  - 17 bb / 100
Big Blind EV:  2 bb / 100

The hero's EV is down 6bb per 100, even though he is still playing the GTO strategy.  The hero's EV decreases by more than the buttons EV, even though the button is the player making a mistake!  If you imagine that every player plays GTO in all positions, except for the one fishy player who is too tight on the button, what happens to the hero's winrate?  He wins 19bb / 100 on the button, loses 17bb / 100 on the small blind, and loses 8bb / 100 on  the big blind, for an average of -2bb / 100.  Playing GTO poker in 3+ way scenarios can lose money if there is a fishy player at the table who is not playing GTO.

If you imagine that the Big Blind player is a smart reactive player it can get even worse!  The condition that the big blind must lose EV if he changes his strategy away from the Nash Equilibrium strategy no longer applies once there is a fish on the button.  The Nash condition is only relevant when ALL players are playing Nash.  Now that the button has changed his strategy, the big blind player can change his strategy as well to increase his profit and to reduce our hero's ev.  If the BB tightens up his over-calling range he can further reduce the hero's EV by almost another 1BB / 100 when the hero is in the small blind.

In 3-way pots with a fish a GTO strategy can lose and furthermore, a smart reactive player can adjust his strategy to make the GTO strategy lose even more.

It is important to note that the above are not due to ICM, they appear even in cash games.  In SnG situations where ICM is a factor there are even bigger and more obvious instances where the presence of a fish can make a nash strategy -EV, but the fundamental issue in both cash games and ICM cases is the same.

Conclusions:


This is in no way meant as a condemnation of GTO play.  GTO strategies are unbeatable for any 2 handed situation and even in 3+ handed situations, understanding GTO theory will give you tons of insights into how to balance ranges and increase your EV.  I believe that GTO will be the driving force in the continued evolution of poker strategy over the next 5 years.  However, this post should act as a warning if you are ever considering turning off your brain and blinding following GTO.  Anyone who misapplies GTO theory to scenarios where it doesn't hold, or who is using it as a crutch rather than as a tool, is going to quickly be left behind.




Tuesday, March 11, 2014

Range Building with Bluff Catchers

Figuring out how to apply poker theory to the practical strategic decisions that you run into day in day out when you're grinding at the tables can be quite difficult, so in this post I want to take a look at using GTORangeBuilder to analyze a specific type of situation that used to give me trouble when I was first learning Heads Up.

The keys to being able to actually apply game theory to improve your poker game are to identify a situation where you know strategy is weak. I find that the easiest way to spot this situations are scenarios where either:
  1. I feel like my hand is face up and I am being taken advantage of or am being forced into complex "he knows that I know that my hand is face up" type of leveling mind games.
  2. I’m folding not because I think you have the worst hand but because I can’t think of a reasonable line to continue with.
  3. I regret my previous action because I feel like it just gave my opponent a chance to exploit me later in the hand.

Once you've identified a leak in your strategy, you just need to think about what your range is and what your opponents range is at various points in the hand and look for clear weaknesses.  Then you can start to figure out out how to adjust your play to plug the leak.

GTORangeBuilder is designed to take care of determining how to optimally plug the leak for you, while identifying the situations where you have leaks is something you’ll have to do yourself as you review past sessions.

The call-check-donk line.

I’m going to run through an example that gave me problems when I was just learning to play HU. These days the solution I’m describing is one of several standard plays that can combat the call-check-donk line, but GTORangeBuilder lets us take a look at the specific mathematical benefits of this “standard play” and my hope is that this example will help people identify and solve new types of problems that can come up when you have unbalanced ranges.

I first started learning to play HU poker in 2008-2009 and to my surprise I found myself struggling in certain spots against players that seemed quite fishy, especially given the way the game was played at the time.

The line that gave me trouble was what I call the call-check-donk line. I’d be playing a generally loose passive opponent who would almost always defend his big blind, say with 90% of hands. I’d c-bet aggressively and he’d almost never fold (c-betting very aggressively here isn't necessarily the best choice, but it's what I tried). If I checked the turn, he’d then lead out with a very wide range on the river.

I viewed this player as a fishy calling station and was trying to play exploitatively against him by value betting very thinly and by betting big so as to get paid off whenever I had a hand. This meant that I was betting big on the turn with almost any strong or medium strength hand to “exploit” his passive tendencies.

However, by doing this my range for checking back the turn was so unbalanced that even a generally fishy player was able to completely exploit me on the river. By betting the turn with all of my medium and strong hands, any time I got to the river without betting the turn my hand was face up as weak and my opponent would donk-bet the river at me and take down the pot. 

This made me feel like I couldn't check the turn with air without completely giving up on the hand. Furthermore, my opponents river donking range was actually extremely well balanced, as they would be check/calling the flop with lots of stronger hands as well as lots of air, and draws.

I was so focused on exploiting my opponent that I didn't pay attention to the massively exploitable leak that I had introduced into my own game. At the time, the way people thought about poker, particularly against fish, didn't involve much consideration of what your own range was in specific situations.  People tended to focus entirely on exploiting their opponents weaknesses with specific hands without realizing that even fishy players can punish you quite effectively if you unbalance your ranges too drastically in an attempt to exploit them maximally.

The easiest way to solve this river problem is to check back some of your medium strength hands on the turn so that your river range is at least balanced between some air and some medium strength hands to use as bluff catchers. This generally lets you get just as much value out of your bluff catchers as you would by barreling the turn with them and also lets you see cheap show downs with the stronger portions of your air. The synergy between your bluff catchers and your strong air on the river is a perfect example of how building strong ranges can drastically increase your EV in everyday situations.

With GTORangeBuilder we can actually take a look at the effects of adding a bluff catcher to your river range on your expected values both with your air and with your bluff catchers. This lets us quantitatively measure the EV of leveraging the synergies between your stronger air that wants a free showdown (eg A high) and your bluff catchers that are ahead of your opponents floating range (eg middle pair).

In this example I’m going to look at two river scenarios. In the first scenario, I’ll consider the hero, who is in position, getting to the river with a range that is pure air vs a villain with a balanced range of air, missed draws, and strong hands. In the second scenario I’ll take the exact same villain range, and I’ll add a medium strength bluff catcher to the hero’s range. The key thing to observe is that by adding a bluff catcher to the hero range, not only does the hero get a strong EV from that bluff catcher in isolation, but it also significantly increases the hero’s ev with all the other hands in his range. 

To keep the examples simple and easy to understand, rather than putting in complete ranges for each player I put in simplified “representative” ranges. A representative range is a range that has the right types of hands in approximately the right proportions, rather than actually listing every single hand in a players ranges. As an example, rather than considering all 192 combos of Ax as separate hands we might break that range into two representative hands of Ad6s and AdQs on a non-flush board so that we can focus strategically on how to play a strong ace vs a weak ace. For a human player, trying to play a strategy that varies its play with all 192 Ax combos isn’t really feasible so using representative ranges is a good way to highlight the strategic differences between hands in a way that you can learn from and apply to your games.

In both the scenarios below, we assume the action went as follows:
  1. Hero raises preflop to 3x, villain calls.
  2. Villian check calls a bet of 4bb on the flop.
  3. Both players check the turn.
To keep things simple, I also assume that no one is going to do anything too crazy on the river (such as putting in > 40 bbs running a huge bluff) and again for simplicity, I considered a small range of bet sizes.

Scenario 1: Hero has only air. Villian has busted draws, random floats, and made hands.

Pot: 14bb
Board: Kc8d2d2h4s
Hero Range: 7c5c, Ah9h, QsJh
Villian Range: 6h5h, Td9d, Qc7h, 8s7d, Ks4c


Scenario 2: Hero adds a medium strength bluff catcher to his range

Pot: 14bb
Board: Kc8d2d2h4s
Hero Range: 7c5c, Ah9h, QsJh, Qh8h
Villian Range: 6h5h, Td9d, Qc7h, 8s7d, Ks4c

The first thing worth noting about these ranges is that the hero actually has almost 50% equity in scenario 1, simply due to both players having air quite often. However, because the hero’s range is poorly constructed he is unable to actually defend his equity and win his fair share of the pot, despite being in position. In fact, as you can see in the solution browser below, even with the top of the hero’s range (Ah9h) in scenario 1 the hero only gets about 40% of the pot on average (click on the left most node of the tree and look at Ah9h in the "Hero Range" chart to see this). Over all the Hero's EV in this scenario is 3.33bb out of the total 14bb pot when both players play optimally.


In scenario two, Qh8h has about 70% equity vs the villains range and manages to average winning close to 70% of the 14 chip pot with that hand.  However, what is more powerful is that adding Qh8h to the players range increases his EV with all of his other hands.  This means the total EV that we gain by putting Qh8h into our river range is the 9.16bb we earn when we have Qh8h plus the additional EV that we gain by increasing our expectation with our other hands.  Even if we ignore the cases when we actually have Qh8h, our average EV with our other hands has increased from 3.33bb in scenario 1 to 4.19bb.  On average we will win almost an extra bb every single hand with the weaker parts of our range just by making our entire range more defensible!  In a game like HU where this type of river situation can easily happen once out of every 10 hands, this type of adjustment could have a 10bb/100 impact on our winrate.  We've also managed to take a line that does at least as well with Qh8h as betting the turn is likely to.

You can browse scenario 2 in the solution browser below.  As you can see the villain is forced to play significantly more passively when 25% of the hero range is a medium strength bluff catcher.

 Furthermore, the hero can now play more aggressively.  Because the villain has to check back a lot of his air in scenario 2, the hero can turn 7c5c into a bluff and take a bet-check-bet line more often (almost twice as often, 53.3% vs 30% according to the solutions). Shout out to NoahVanderoff from reddit for pointing this out.



This example is designed the highlight the key concept at the heart of building optimal ranges.  Ranges that lack the proper diversity can make it extremely difficult for us to defend our equity against a competent opponent, even when we play optimally.  We cannot just try and "out think" or "out play" an opponent when we have a poorly constructed range unless we can rely on him making a lot of major mistakes.  The much better response is to instead adjust our hand range by adding synergistic hands that perform well on there own while also improving the performance of all other hands in our range.

Submit your own scenarios for analysis

If you have a leak or river scenario that you'd like help analyzing post it in the comments and if I have time, I'll analyze it in depth and make it the topic of a future blog post.

Thursday, March 6, 2014

Strategies, Nash Equilibria and GTO Poker

To really wrap your head around GTO poker it is important to have a good grasp of what defines a strategy in NLHE and what makes a set of strategies a Nash Equilibrium or GTO.  This post is designed to give you a formal theoretical definition of these terms as well as some practical examples and intuition about how they relate to day to day poker.

What is a poker strategy?

This is actually a more complicated question than you might think.  If this seems a bit confusing at first just keep reading until you reach the practical examples as they should clear things up.  The sections highlighted in grey have more mathematical details and are fine to skip if you just want a high level overview.

Generally speaking a strategy for poker says what to do in every possible situation with every possible hand.

 Mathematically, a strategy is a function f(E, H) that for every possible sequence of past events, E, and for every possible hand we might be dealt, H, returns a legal poker action A (like bet a certain amount/call/fold). 

The simplest example of a poker strategy is a push fold chart like many players use at the end of SnGs, such as this one.  It states for every possible stack size, every possible hand, and whether an opponent went all in before you or not what to do, either go all in or fold (interpret the 'callers' action as going all in if his opponent does anything other than shoving or folding).

More complex strategies that involve complex postflop play, reacting to tendencies that our opponents exhibited in past hands, etc are practically speaking impossible to write down in a simple chart as they need to describe what to do with every possible hand in response to every possible set of opponent actions every possible board.  In fact, even for a computer to store the information for a complete strategy is completely impossible, a complete strategy wouldn't fit on all the hard-drives on earth combined if it were to contain hundreds of possible different bet sizes .

Despite that complexity, people obviously play solid poker, and with modern software like GTORangeBuilder we can compute highly accurate approximations of postflop GTO play, so how does this work?  Without a clear definition of a strategy we cannot analytically and numerically compare the performance of strategies, nor can we solve for approximate or exactly Nash Equilibria.  So our first step must be coming up with a formal definition of a strategy and make it applicable to actually playing poker in a practical sense at the tables.

It's actually quite simple, we just have to limit ourselves to considering strategies that either, ignore lots of information, or that make a very limited set of choices, or both.  Again lets go back to the push fold charts that I mentioned above.  They completely ignore information as they ignore all history from past hands and any information we might infer from our opponents prior actions.  They also limit the number of actions they might take to shoving and folding.  While it might seem like there is no way that ignoring that much information could possible lead to a strategy that is useful in actual poker games, it turns out that when stacks are small and blinds are big, push fold equilibria are extremely profitable tools to use. SnG end game push fold strategies have been used by elite players world wide to win millions of dollars as part of a very short stacked strategy.

Going back to our mathematical definition, ignoring information comes down to limiting what we consider in terms of past events E in our strategy f(E, H).  The push fold charts, limit E to a binary value, did someone else go all in before us or not?  They also then limit what the set of actions A that the function f can return to two options, go all in or fold.

Given how that even the simplest mathematical analysis of push fold equilibria has been useful even at high levels of play, it seems natural to ask what expanding either E, A or both can do to bolster our understand of poker.  Luckily there are some logical ways to think about how to go about considering slightly more general strategies and what additional elements make the most sense to add to E and A.  I'll get into that more after defining a Nash Equilibrium.

What is a Nash Equilibrium?

A Nash Equilibrium is a group of strategies for every player in a game where:

  1. Each player knows every other players strategy exactly
  2. Each player's strategy is maximizing their expected value against their opponents strategies.


The easiest way to understand this is through simple examples.  A common one is Rock Paper Scissors.  Lets consider strategies that ignore history and just play Rock, Paper or Scissors with specific probabilities.  We can represent these as three numbers that sum to 1 where the first number is the chance of playing rock, the second the chance of playing paper and the third the chance of playing scissors.

Consider two simple strategies, good old rock (1, 0, 0), which always plays rock, and the diplomat (0, 1, 0), which always plays paper.

This pair of strategies is not a Nash Equilibrium because if the player using good old rock knows in advance that he is playing against the diplomat (against which he always loses) he could increase his expected value by playing scissors more and rock less.

If we instead consider two strategies that play Rock, Paper and Scissors equally, (1/3, 1/3, 1/3) and (1/3, 1/3, 1/3) then we can see that those strategies do constitute a Nash Equilibrium.

In this situation both players are winning and losing half the time.  Increasing the frequency with which they play any particular option won't change that at all, they still will win and lose half the time.  So even knowing their opponents strategy exactly, they have no way to alter their own strategy to increase their expected value.

An important thing to note that is relevant to poker is that, in a repeated game (eg where you play multiple rounds of Rock, Paper Scissors, or multiple hands of poker) playing the single round Nash Equilibrium repeatedly is also a Nash Equilibrium of the repeated game.  However, repeated games also can have additional more complex Equilibria that adjust based on the actions taken in previous rounds.

What makes Nash Equilibria Powerful

The big reason that Nash Equilibrium strategies are powerful is that they give you a guaranteed minimum EV.  The way they are defined, they assume your opponent knows your strategy and that his strategy is the absolute best possible counter to what you are doing.  Thus if you play against any other opponent who is not perfectly countering your strategy, your EV can only go up.

In two player situations, Nash Equilibria are strategies that provide the best possible guaranteed minimum EV.

The other power of Nash Equilibria is that they are simple.  Nash Equilibria strategies by definition don't make specific plays against specific types of opponents based on past history.  They assume that your opponent will correctly adjust to whatever you do and thus they treat all opponents as the same.

This has the advantage that it drastically reduces the scope of strategies we need to consider and allows us to focus more on playing 100% solid poker ourselves rather than constantly trying to get inside our opponents heads.  However, it also has the weakness that against opponents who are making lot of mistakes, a Nash Strategy will not earn as high an EV as a strategy that is perfectly designed to counter the specific mistakes of our opponents.

While this weakness is certainly relevant, Nash Equilibrium strategies in poker do quite well, even against opponents who make a lot of mistakes. Furthermore, any adjustments you make to exploit an opponent will open up leaks in your own game that your opponent can use against you such that you are likely to perform worse than the Nash strategy. Even players that seem very fishy adjust to their opponents play more than you might expect.

Also by using a concept know as minimally exploitative play which is a variant of a GTO strategy we can still exploit our opponents mistakes while minimizing our own exploitability.  These types of strategies can also be calculated with GTORangeBuilder and I will get into exactly how that works in future posts.


Nash Equilibria in Poker

In poker, to understand Nash Equilibria you have to think at the strategy level.  If we take the push / fold game example again, the idea is that both players must know each others strategies and have no way to change their own strategy and increase their EV.  This does not mean that they know their opponents specific hand in a situation, just his strategy.  So in a push fold equlibria, when your opponent pushes all in at you, you would know the exact range of hands that he might do that with, but you would have no idea which specific hand he actually held.

To check if a set of push fold strategies are an equilibrium in a HU game we'd need to consider the following.
  1. Take the Small Blind's strategy.  Given the hand range that the Big Blind is calling our shoves with, can we increase our EV by folding any of the hands we are shoving?  Or shoving any of the hands we are folding?
  2. Now take the Big Blind strategy.  Given the hand range that the Small Blind is shoving with, can we increase our EV by calling with any hands we are folding?  Or by folding any of the hands that we are calling with?
If after checking 1 and 2 above we conclude that neither strategy can be changed to increase its EV, then we have a Nash Equilibrium.  A simple example of a push/fold Nash Equilibrium in a HU game with 10BBs is here.

Note that, if we are being precise, the push/fold equilibrium is not actually an equilibrium of a regular poker game because it doesn't consider what would happen if the small blind raised to say 2x rather than shoving.  In push / fold equilibrium we limit the set of strategies we consider to only strategies that push or fold.  In practice, if your opponent might actually, say, min-raise and then fold, and you as the big blind respond by shoving over your opponents min-raise then you are not actually playing a Nash Equilibrium strategy and are not guaranteed to achieve a specific EV.

At a high level Nash Equilibrium strategies can never rely on "tricking" our opponent.  For example, if there is some river bluff shove that we would actually only ever make with a busted draw and never with the nuts, then an opponent who knew our strategy would never fold to that bluff.  However, Nash Equilibrium strategies still make plays that are very difficult for our opponent to react to, not by being sneaky, but by instead balancing the hand ranges that we take specific actions with such that it is impossible for our opponent, even if they know our strategy, to read our specific hand and take advantage of us.  For example, an equilibrium strategy might make the same river bluff shove with a busted draw, so long as it also occasionally shoved the river for value in the same situation with the nuts.

Nash Equilibrium strategies also make fundamentally solid decisions regarding relative hand strengths equities, pot odds, and probability 100% of the time.  For example, a Nash Equilibrium strategy will never fold the nuts, it will never call a bet with a hand that doesn't have enough equity against an optimal opponents range to make the call plus EV, it will never call with a weak hand in the same situation that it would fold a strong hand, it will never use an inefficient bet size, etc.  All those small optimizations mean that even when a Nash Equilibrium strategy is not directly exploiting our opponents it will still tend to crush weak players, just by being fundamentally solid and error free.

Obviously push / fold strategies barely scratch the surface of poker strategy so naturally we'll want to consider Nash Equilibria that model more complex poker decisions with multiple betting rounds, varying bet amounts, etc.

GTORangeBuilder was the first program to go beyond push/fold strategies and to make postflop Nash Equilibria accessible and easy to compute and analyze.  Currently, GTORangeBuilder focuses on heads up river postflop scenarios and it simplifies the space of strategies by only considering specific bet sizes, and by condensing all the information from actions taken prior to the river into few key components:
  1. The size of the pot and the effective stack sizes when the flop was dealt
  2. The starting hand ranges of both players when the flop was dealt
  3. A list bet sizes for each player to use on each street.
It then generates complete strategies for each player that are a Nash Equilibrium.  Specifically this means it lists for each player, what they should do with every hand in their range in every possible scenario that might occur with the given list of bet sizes.  The best way to understand this is to see it in action, so check out this post for a practical example of looking at Nash Equilibrium play with GTORangeBuilder and how to use that to improve your every day decisions at the tables.

Saturday, March 1, 2014

Poker is evolving

Over the past few years, the level of play in No Limit Hold’em, particularly online, has increased dramatically.  As Phil Galfond put it, “Five years ago, I was one of the top HUNL players in the world.  You can take any excellent $5/$10nl regular from today, put him in a time machine, and he’d have beaten 2009 Phil”.  

Many of the weaknesses of the players of the 2000s were a result of only taking certain actions with a very specific set of hands, thus making it very easy for a competent opponent to dissect and destroy their strategy.  As a result players begin to work game theory concepts into their play and now it is common to find even micro stakes players discussing concepts like balancing ranges, playing with and against capped ranges or polarized ranges, when even just a few years ago these ideas were rarely discussed, and poorly understood.

However, from a game theory perspective the discussion around trying to “balance your range” and play “GTO poker” is still in its infancy and game theory concepts are often being applied to poker non-rigorously or incorrectly.  

Even the phrase GTO is never actually used in formal game theory and is something that was pulled out of the ether of poker forums and people (and companies) get away with using it inaccurately in all sorts of situations.  In most cases, no one actually knows what the “GTO” (game theory optimal) play in a given situation is so people throw the word GTO around without bothering to actually prove that the strategy they are advocating is GTO which leads to a lot of false claims and misinformation.

What does GTO mean?

In a mathematical sense, a set of GTO strategies is a Nash Equilibrium, or as wikipedia defines it, a set of strategies “in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy.”  In general, finding Nash Equilibrium strategies in complex games is extremely difficult, but verifying them can actually be quite easy (I’ll get into the details of that in another post).  Currently, the only poker situation where Nash Equilibrium strategies are easily found and widely used is in constructing linear shove fold situations late in SnGs and tournaments when the players stacks are very short.

In a practical every-day sense, in poker, creating GTO strategies involves considering how you play a group of hands rather than an individual hand, and looking for synergies between specific hands that make it as difficult as possible for your opponent to react properly to your decisions, even if they know exactly what strategy you are employing.

The simplest example of this type of hand synergy comes up in “nuts or air” situations with a polarized range (when the only hands you might have are extremely strong hands or extremely weak hands).  If you were to only ever bet with the strongest hands hoping to be called, a clever opponent would always fold, while if you only ever bet with your weakest hands as a bluff, a clever opponent would always call.  

However if you instead bet with a properly weighted group of strong hands and weak hands you put your opponent in a difficult position.  By folding they risk losing a big pot when you are bluffing with a weak hand and by calling they risk paying you off when you have a strong hand.  By betting your weak hands you can increase the expected value of betting your strong hands, and vice versa.  This is what building GTO ranges is all about.  This simple example is discussed in depth via the “toy” AKQ game The Mathematics of Poker which is a must read.

Why GTORangeBuilder?

There are millions of other ways to build ranges that synergize different levels of hand strength to increase your EV that are less intuitive than the simple polarized range case of the AKQ game and the linear shove fold ranges that SnG players use, but until now, no one has really had to tools to effectively build and analyze them.

GTORangeBuilder is designed to solve this problem by making it possible for anyone to compute Nash Equilibrium strategies for almost any river situation so that we can transform the discussion around GTO poker from hand waving and “toy” examples to something concrete, exact, and verifiable that can be applied at the tables to increase your win-rate on a daily basis.

GTORangeBuilder lets you define a river scenario by entering hero and opponent hand ranges, stack sizes, and some bet sizing assumptions.  GTORangeBuilder will then compute equilibrium strategies for both players for every possible decision in the hand.  These strategies are game theory optimal and are presented in a way that makes them mathematically verifiable. 

Right now, its up to you to do your own hand reading and range balancing up to the river, but from there GTORangeBuilder can determine optimal play that requires no hand reading, or psychological guessing games and if you play GTORangeBuilder strategies it guarantees you a given expected value against any opponent on earth.