Sunday, November 16, 2014

GTO is so much more than unexploitable

One of the most common misconceptions that people tend to have regarding GTO poker play comes from the idea that somehow the key element of a GTO strategy is its "unexploitability" or "balance" and the belief that any unexploitable strategy is inherently GTO.

The conditions required for a strategy to be GTO are much stronger than simple unexploitability (although of course any GTO strategy must be unexploitable), and in a practical sense, the elements of GTO play that are generally going to be the most valuable to try and use in real world poker games are the elements that have nothing to do with unexploitability.  By focusing on unexploitability people minimize and miss what is actually the a huge part of the value of understanding GTO play.

Today I'm going to take a look at why people have come to often confuse the idea of unexploitability with GTO and go through the core definitions and a simple example that illustrates the key difference between a GTO strategy and one that is only unexploitable.  This will also serve as a nice lead in to my next post which will present a practical example of how to analyze and improve your 6-max OOP turn play in raised pots by better understanding GTO.

Toy Games and GTO


GTO play often gets confused with unexploitable play due to the fact that in the very simplest toy games the two are equivalent.  People learn the solution to the toy games, without fully understanding the definition of GTO and assume that they now know what GTO means.

Games like Rock Paper Scissors, or the Clairvoyance game from the mathematics of poker only have a single "reasonable" unexploitable strategy, which happens to also be fully GTO, which means that people who are new to game theory are prone to mistakenly assume that GTO and unexplotable are equivalent.

Furthermore, in these toy games, you can solve for that GTO solution using only indifference conditions (which only can be used to identify unexploitability) and thus the mechanism for finding the solution reinforces the idea that unexploitability is all there is to GTO.  In fact, most arguments I've heard against GTO play stem almost entirely from generalizing from the Clairvoyance game to all of poker without any thought to the idea that a game that is trillions of times bigger might be fundamentally different.


This happens because when solving toy games we usually automatically discard strategies that might be unexploitable but intuitively are obviously dumb.  However, this completely breaks down in extremely tough games because the "obviously dumb" decisions are no longer at all obvious, and in fact identifying and avoiding these "obviously dumb" leaks in large games is where a huge amount of the value of studying GTO play comes from.

Definitions


Lets go back to the core definition of a GTO strategy, which is a strategy for a player that is part of a nash equilibrium strategy set.  In a 2 player game a strategy pair is a nash equilibrium if, "if no player can do better by unilaterally changing his or her strategy" (source wikipedia).

How does this actually tie into the concept of exploitability and in a technical sense, what does exploitability actually mean?  The idea of exploitability is relatively intuitive, if your strategy is exploitable, it means that if your opponent know your strategy they would be able to use that information to alter their own strategy in a way that would increase their EV against you.  Formalizing the above, gives us an accurate definition of exploitability, but we need to define one more concept first.

A "best response" (sometimes called a "maximally exploitative strategy", or "counter strategy") to a given opponent strategy is a strategy that maximizes our EV against that opponent strategy, assume that his strategy is completely fixed.

Exploitability can now be defined (and measured) as follows.  Call G our GTO strategy, and S our opponents strategy.  Call B the best response to S.  S is exploitable if our EV when we play B against S is higher than our EV when we play G against S and that EV difference is the magnitude of the exploitability.

Intuitively, this should make perfect sense, our opponents strategy is only exploitable if we can alter our own strategy to exploit him and increase our EV, and the amount of EV we can gain when we maximally exploit him is an accurate measure of the magnitude of his exploitability.

Any GTO strategy must be unexploitable to satisfy the definition of a nash equilibrium, but in complex games there are usually infinitely many inferior unexploitable strategies that are not GTO.

A GTO strategy, is a strategy that is using every possible strategic option and every synergistic interaction between various hands in our range to maximize our EV while also still being unexploitable.  In most real world cases, understanding which of our strategic options are strong and how to correctly leverage that strength against our opponent in ways they cannot prevent is what makes GTO play powerful.

An Example -- GTO Brainteaser #6


Armed with our definitions we can now look at a very simple example of a unexploitable, non-GTO strategy.  Keep in mind that even this example is a relatively simple toy game and that the real game of poker generally has infinitely many unexploitable strategies that pass up EV and are not GTO for far more complex reasons.

We're going to revisit the model game from GTO Brainteaser #6.  The setup is as follows:

There are 2 players on the turn with 150 chip effective stacks, and the pot has 100 chips.  The Hero has a range that contains 50% nuts and 50% air and he is out of position.  The Villain has a range that contains 100% medium strength hands that beat the Hero's air hands and lose to his nut hands.

For simplicity, assume that the river card will never improve either players hand.

The hero has 2 options, he can shove the turn or he can bet 50 chips.  If the hero bets 50 chips on the turn he can then follow up on the river with a 100 chip shove.

As I show here it turns out that GTO play for the hero is to bet 50 chips on the turn and then 100 chips on the river with precisely constructed ranges that contain the right relative frequency of nuts and air, and to check/fold with the rest of his air.  Following this strategy gives the villain an EV of 11.11 chips.

GTO play for the villain is to call a 50 chip turn bet or a 100 chip river bet 2/3rds of the time and to call a turn shove 40% of the time.

Now consider the non-optimal strategy S where the hero shoves the turn 100% of the time with the nuts and 60% of the time with his air, and check/fold sthe rest of his air.  It is easy to check that the EV of this strategy is 20 chips for the villain.  So we are giving our opponent almost double the EV by playing this weaker strategy S where we jam the turn.

Clearly S is not GTO, we could unilaterally increase our EV by switching to the GTO strategy, because S is fundamentally not wielding our range and our stack to optimally prevent our opponent from realizing his equity.

However, S is completely unexploitable.  Because our turn shoving range is "balanced" to be 3/8ths bluffs our opponent is exactly indifferent between calling and folding to a shove so he cannot increase his EV by switching from his GTO strategy to a maximally exploitative strategy.

Someone looking for an "unexploitable" strategy might be happy with the strategy S, but in this case, S misses the entire practically valuable lesson that we can learn from the model, which is that by betting half pot twice we actually utilize our polarized range much more effectively than we do by shoving it, to the extent that we cut our opponents EV approximately in half.  The entire point of the example and its power is completely lost if we focus on unexploitability rather than on EV maximization and the true definition of GTO.

In fact our EV if we play an exploitable strategy where we bet 50 chips and then jam the river for 100, but with slightly incorrect value bet to bluff ratios is much higher than it is when we play the unexploitable strategy S, even if our opponent perfectly exploits us.

Similarly, when analyzing complex real world situations, focusing on unexploitable play is generally going to completely miss out on valuable lessons that a thorough study of GTO play has to offer.

Special thanks


This post was actually largely inspired by an email from a GTORangeBuilder user who was confused that using CardRunners EV's "unexploitable shove" option didn't give him a GTO strategy like GTORangeBuilder does.  Of course the feature does exactly what it says, it gives a range that is the best range for us to shove if our opponent perfectly counters our shove, which in no way suggests that the shoving range is actually GTO.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.