Tuesday, September 2, 2014

GTO Poker and Multiple Equilibria Part 3

In part 1and part 2 of this post I described some of the properties of multiple equilibria in zero sum games and explained how different equilibrium solutions can perform differently against various types of sub-optimal players.  The basic idea is that by playing exploitatively against lines that optimal players don't actually use we can remain completely unexploitable while still targeting and attacking leaks in our opponents play.  This is accomplished by just shifting which GTO strategy we are playing at any given time, based on our opponents tendencies.

Today I'm going to conclude that discussion by going through an example of a simplified poker scenario with two equilibria, each of which performs quite differently against various types of fish.  I described this example in detail at the end of part 2 so I'll just very briefly reiterate the situation here.
  1. The board is: 2sTs9c5h3s
  2. The IP range is: 22, 87, T9, QJ, Ks8s+, As2s-AsJs, 7s6s, 9s8s, Qs9s
  3. The OOP range is: QQ+
  4. There are 100 chips in the pot and 150 left to bet
As we saw last time, it is never GTO for the OOP player to lead for 50% pot here with any of his range.  However, we're going to consider the performance of two GTO strategies against two types of fish, both of whom are going to randomly lead for half pot with their entire range 10% of the time.  Fish 1 is thinking that "when he shoves here he's never bluffing" and is feeling you out with his bet and plans to fold to a shove 100%.  Fish 2 is thinking "OMG I haz overpair" and is planning to call a shove 100% of the time.

Our goal was to find two strategies that are GTO (this requires that there is no profitable deviation that would allow a GTO opponent to increase his EV by leading for 50% pot with some hand is his range) but that also extract as much extra value as possible from each type of fish who decides to lead for 50% pot.

This means that we need to ensure that however we react to a 50% pot lead, the EV for our opponent against that reaction is lower EV than the EV of him playing the GTO strategy for every hand in his range, In this situation it was optimal to always check.  I've included the checking EVs below.

        Hand               % of Range           Check EV
Q
Q
5.56
14.02
Q
Q
5.56
14.02
Q
Q
5.56
16.69
Q
Q
5.56
14.02
Q
Q
5.56
16.69
Q
Q
5.56
16.69
K
K
5.56
25.87
K
K
5.56
25.87
K
K
5.56
30.57
K
K
5.56
25.87
K
K
5.56
30.57
K
K
5.56
30.57
A
A
5.56
25.87
A
A
5.56
25.87
A
A
5.56
41.15
A
A
5.56
25.87
A
A
5.56
41.15
A
A
5.56
41.15






Now if we were just being maximally exploitative against these fish, we would always shove against the fish who leads with the intention of folding and we would only shove our 2-pair + against the fish who shoves with the intent of calling (otherwise we'd fold), however, doing so might open up the opportunity for an exploitative opponent to attack us.

For example, if we were to always jam over a lead, an exploitative opponent could bet call QQ and win 45.9% of the time for a massively profitable deviation.  The EV of a bet call would be .459 * 250 - .541 * 150 = 35.1 chips vs the 15.36 EV of checking QQ when both players play GTO.  In general, we won't be able to take maximally exploitative lines, instead we'll need to find strategies that are moderately exploitative while maintaining enough balance to stay GTO.

Doing this at least approximately is actually relatively straight forwards.  With these ranges, the hand our opponent would most like to deviate with is QQ without the Q of spades as it is his lowest EV hand for checking and all his hands have similar equity vs our range.

We want to shove as wide a range as possible while keeping our opponents EV for bet calling QQ below 14.02.  All this is quite simple to do in CREV, but first as a reminder here is the GTORB equilibrium solution which we will use as a starting point.






Some quick tinkering in CREV will show that you can take the GTORB equilibrium strategy which folds about 34.3% of the time to a lead and make it more aggressive by shoving all 87s and only folding 32% of our 87o.  This shifted strategy is still GTO (which can be verified by using CREVs max-exploit button against the shifted strategy and verifying that the BB EV is still 25.87 but it will Jam and pick up $150 chips instead of folding an additional 6.1% of the time for an EV gain of 10 chips against the fish who bet folds.  That's an additional 10% of the pot in the cases where our opponent donks!  We'll call this strat GTOShove.

Similarly, against the fish who is going to bet call, we can shift the GTORB strategy in the opposite direction by folding more of our range while staying GTO.  It turns out we can fold all our hands that our opponent beats except for 4 combos of 87o while staying GTO because our range is polarized while our opponents range is condensed so betting is just a weak play.  We'll call this strat GTOFold.  Note that in this case, GTOFold is very close to maximally exploitative in terms of how it responds to a river lead (the 4 bluff raise combos are the only difference in strategy)!  You can see GTOFold vs the B/C fish here: http://gtorangebuilder.com/#share_scenarioHash=baf4b0e262af2f74264162cd34a82b5a/root_v=38.1.  Note that I am rounding to a whole number of bluff combos, rather than considering strategies with eg 3.7 bluff combos, there may be a very slightly better GTOFold with fractional bluff combos.

I've put together a chart of the overall strategy vs strategy EVs.  The % exploit is the percentage of the maximal exploitative leak that our strategy extracts from our opponent.  Specifically it is:

(EV[our strat vs opponent strat] - EV[gto vs gto]) / (EV[max exploit vs opp strat] - EV[gto vs gto])

Of course all of our strategies are unexploitable so I didn't include our exploitability in the chart.  When calculating the maximally exploitative strategies, I only considered exploiting our opponents in response to their river lead, I did not consider altering our strategy at all when responding to a check.


Our Strategy               Opponent Strategy                  Our EV                        % Exploit
     GTORB                            GTORB                              74.1                              N/A
    GTOFold                           GTORB                              74.1                              N/A
   GTOShove                         GTORB                              74.1                              N/A

    GTORB                             B/C Fish                            75.4                              27.3%
   GTOFold                            B/C Fish                            77.7                              80%

     GTORB                            B/F Fish                             77.4                              43.1%
   GTOShove                         B/F Fish                             78.0                              51.3%


This is one of the simplest examples of how GTO theory can be combined with exploitative play to create strategies that are impossible for our opponents to counter, but that still allow us to adapt or strategy to specifically target weaknesses that we identify in our opponents.  People often consider GTO play as a purely passive style where you never adjust to how your opponents play in any way but in reality there are a variety of ways to adapt and attack your opponents while remaining completely unexploitable (as shown here), or while using GTO concepts to find the absolute least exploitability strategy that can achieve a specific win-rate against a specific opponent or while making it so that we are only exploitable in ways that we don't think our opponent will capitalize on.  I'll be discussing all of these in more depth over the coming months.