Wednesday, April 2, 2014

Unexploitably Exploitative: How to (Out)Think Like a Fish

When poker players discuss game theory, two fundamental styles of play are generally considered, exploitative play and GTO play.  When playing exploitatively, you identify a known leak of your opponent and then make decisions that extract the maximum possible value of your opponent given that leak.  When playing GTO you ignore your opponent's strategy completely and you make perfectly balanced decisions that prevent a perfect opponent from exploiting you.

In practice, both of these approaches have significant drawbacks.  Even fish are rarely bad enough to not catch on and adapt to maximally exploitative play, and determining maximally exploitative play requires knowing every detail of your opponent's strategy which is practically impossible in real life.  While GTO play performs well against fish as I show in this post, it will generally leave money on the table by ignoring opportunities to attack our opponent's specific known weaknesses.

In this post I'm going to discuss the concept of Unexploitably Exploitative (UE) strategies as an in-between option that retains most of the desirable properties of GTO play, while still allowing us to systematically attack specific leaks in our opponent's play.  I'm in the process of adding the ability to calculate UE strategies to GTORangeBuilder as the techniques for actually computing these strategies are identical to those for solving for GTO play.

Game Theory and Imperfect Play


When playing an actual poker session, most people have a generally solid strategy that they start with (ideally this would be GTO).  Then as they identify weakness in their opponents, they make relatively small adjustments that take advantage of these weaknesses, rather than transforming their entire strategy into a maximally exploitative line.

The challenge we face is: how can we mathematically identify the best ways to make these small adjustments and also know how big the adjustments should be?

People often seem to think that game theory has nothing to say about games against imperfect opponents; that is games where some or all decision makers sometimes make suboptimal choices, but in actuality, there are entire fields such as experimental economics, and behavioral economics that focus on exactly these problems.  There are also known mathematical techniques for applying traditional equilibrium concepts to imperfect opponents.

Today I'm going to focus on the latter approach and talk about a concept known as "tilting", in the Game Theory sense, not in the Poker sense :)  We'll apply the traditional Nash Equilibrium solution concepts to an adjusted ("tilted") version of the game of poker that incorporates our opponent's weaknesses into the game by altering its payoffs.  This will allow us to find strategies that our weak opponents can only adjust to by changing the fundamental way in which they are weak.  I call these unexploitably exploitative strategies.  The strategy is unexploitable in the adjusted game because it is a Nash Equilibrium, but is exploitative in the real game and thus will extract EV from our opponent's weaknesses.

The only way to really understand this concept is to see it in practice, so let's start with a very simple example of the bluffing game.

The Bluffing Game


Consider the following extremely simple example of a poker situation.  The game starts on the river with two players and a pot of 100 chips.  Player 1 must act first and can either bet 100 chips or check.  If Player 1 bets, Player 2 can call the 100 chip bet or fold.  If Player 1 checks, Player 2 can bet 100 chips or check.

The board is 2c2s2h3c3s

Player 1's range contains AcAh and QcQh each with equal weight.
Player 2's range contains only KcKh.

Both players' ranges have 50% equity, but player 1's range is nuts or air, whereas player 2's range is a single hand right in the middle.

The Nash Equilibrium of this game is very simple.  Player 1 bets with AA always and with QQ half of the time.  Player 2 calls with KK half of the time, and checks back when Player 1 checks.

You can see the solution here:


Note that the game significantly favors the player with the nuts or air range.  Player 1's EV in the game is 75 chips while player 2's is 25 chips, when both players play perfectly.

Now let's suppose you were Player 1 playing this game 50 times in a row against a slightly fishy Player 2. You've watched him play the same game against a previous opponent and noticed that he calls with KK 55% of the time, and any time he folds he grumbles about how you probably had a Q.  When he calls and you do have a Q, he complains about you trying to bluff at him.  The fishy player hates feeling like he is getting bluffed off the best hand and thus he can't help himself from calling a bit more often than is optimal.

GTO play does not earn any additional profit against this fishy opponent.  With our aces we get (100 * .45 + 200 * 0.55) / 2 = 155.  With our queens we get 0 * .5 + .5 * (.55 * - 100 + .45 * 100) = - 5.  So if we average 155 and -5 we get 75, the same as the equilibrium EV of the game.

Maximally exploitative play says we should bluff 0% of the time versus our opponent, and thus only bet with AA.  This will increase our EV from 75 chips to 77.5 chips per round.

So say we decide to play maximally exploitatively.  From our opponents perspective, rather than seeing us bet 75% of the time he will see us betting 50% of the time.  Remember, we're not assuming our opponent is a complete idiot, we're just assuming that he has a specific ego driven leak where he hates to feel like he's being bluffed out of pots.  As soon as he suspects that we hardly ever bluffing, he is likely to adapt to our play.

Imagine that he is very slow to catch on but after 45 of the 50 rounds, he's become quite sure that we are very rarely bluffing.  In reality 45 is a big enough sample that it would be nearly impossible for our opponent not to be certain that we were betting much less than 75% of the time, given that we had actually only been betting our Aces.

Over those 45 rounds of exploiting our opponent, our EV gain relative to GTO was 2.5 chips per round, for a total of 112.5.  Say our opponent switches to an always fold strategy just for the final 5 rounds. Statistically it would be very difficult for us to detect this and adjust, because the odds of randomly folding the first 4 times in a row while playing a fold 50% strategy are 1/16.  That is, over a 50 hand sample we'd expect to see 4 folds in a row quite frequently.

In those last 5 rounds our opponent folds to our AA and wins showdowns against our QQ every time, upping his EV from 25 chips to 50 chips per round..  This means our opponent nets an EV of 125 by counter-exploiting us for 5 rounds.

Our total EV over 45 rounds of us exploiting our opponent and 5 rounds of him counter exploiting us is actually -12.5 chips relative to the GTO vs GTO EVs.  That's worse than playing GTO the whole game, even when we correctly identify our opponents leak, play maximally exploitatively against it, and have an opponent who is very slow to react to our exploitative play.

Unexploitable Exploitation


The basic problem with the maximally exploitative play above is that it doesn't try to exploit a reasonable underlying cause of our opponents' leak.  We don’t necessarily need to know the exact underlying cause, but we can assume that the fish is not an idiot or completely insane.  Thus we can't just expect him to completely ignore our betting frequency when we are obviously playing maximally exploitative, completely unbalanced strategies.  That is both underestimating our opponent and poorly identifying the root cause of his leak.

Rather than assuming our opponent is completely oblivious to our play, we'll assume that he is an otherwise generally solid player who fundamentally hates getting bluffed out of pots.  It makes him feel weak, and wondering if we were bluffing is stressful, so he hits the call button whenever it seems close, even if he has a feeling it might be slightly -EV.

To attack this opponent intelligently we need to actually build his weakness into the structure of the game, which is actually very easy to do.

In Game Theory and economics rather than considering peoples payoffs as money, they are described as utility.  Utility is the overall measure of happiness that one gets from specific outcomes.  A perfect poker player would be able to win the maximum number of dollars by having his utility function exactly equal to the expected value of the number of dollars the he won or lost with every decision.

However, in real life, almost no poker player actually achieves the type of utility function.  Some of us love or hate variance, some of us hate looking stupid and will avoid a call that might make us look dumb if we feel the call is very close to 0 EV.  Others love running a big bluff, or love feeling smart when they make a very tricky thin value bet and get called.  Other leaks can be more tricky, some people experience a bigger emotional difference between a -$50 session and a +$50 session than they do between a +$500 session and a + $600 session.  Or they feel better about themselves when their non-showdown winnings are positive.

To the extent that these feelings impact our decision making, they are all leaks that decrease our dollar EV, but there is absolutely nothing that prevents us from incorporating these concepts into game theory and solving for equilibrium of that new game.

In the regular bluffing game, the EVs for each player are as follows, I ignore player 1 checking Aces or player 2 betting when checked to as both are clearly weak plays.

For Player 1

EV[Bet Queens] = pC * -100 + (1 - pC) * 100
EV[Check Queens] = 0

For Player 2

EV[Call with Kings] = pQ * 200 - pA 100
EV[Fold with Kings] = 0

Where pC is the probability that player 2 calls a bit with a King and pQ and pA are the probabilities that player 1 bets his Aces.

At equilibrium player 1 must be indifferent between betting and checking his Queens and note that clearly pA must be 1, so it is easy to see that pQ must be 1/2.  Similarly since player 2 must be indifferent between calling and folding with his Kings and pC must be 1/2.  Let's now imagine getting inside the fish's head and changing the actual game into the game that he imagines he is playing given his utility function.

We can think of it as if, in his mind, folding when you bet with Queens is worse than 0 EV.  Symmetrically, in his mind, he believes that when you, Player 1, make someone else fold to your Queens, you feel better than 100 EV by the same margin.  Let's call that margin X.

Note that we don't need to be exactly correct on his internal motivations and psychology for this technique to work.  As long as he is playing as if his utility function is adjusted by X as described, the actual psychology behind it is irrelevant.  Psychological insights are only needed to the extent that they can help us identify our opponent's utility function and these insights can always be made statistically based on passed observations rather than based on psychology if need be.

If we insert X into our equations we get:

For Player 1

EV[Bet Q] = pC * -100 + (1 - pC) * (100 + X)
EV[Check Q] = 0

For Player 2

EV[Call with K] = pQ * 200 - pA 100
EV[Fold with K] = -pQ * X

If we again imply indifference conditions we get that 

pC = (100 + X ) / (200 - X)
pQ = 100 / (200 + X)

Now we observed that this player calls 55% so we can impute the value of X as 6.45.  If X is 6.45 then the optimal pQ is approximately 48%.  These strategies represent optimal play in the game that the fish's sub-optimal preferences create.  The fish cannot improve his EV by changing his strategy without fundamentally changing his nature by having a different value of X.

Let's now step away from the adjusted game with the fish's utility function and step back into the actual game with dollar payoffs.  If we were to play these optimal strategies in the real game (where X is 0) what would our EV be?

For Player 1, it would be the same when we have Aces, 155.  Then when we have Queens we get 0 * .52 + .48 * (.55 * - 100 + .45 * 100) = -4.8 for a net EV of  0.1bb above GTO per hand (or 10bb/100).

While that isn't nearly as good a win-rate as the maximally exploitative strategy, it is a win-rate that our opponent cannot possibly prevent us from achieving unless he actually changes his entire way of thinking by reducing his internal value of X.

Thus our strategy is unexploitably exploitative.  It is exploiting our fishy opponent in the actual game, based on the nature of his specific weakness, but in a way that is completely unexploitable (because it in a Nash Equilibrium) in the version of the game where the fish is paid based on his true utility function.

By building his fishy nature into the payoffs of the game, we don't have to assume that he is an idiot, or that he is paying zero attention to our play and will never adapt.  Instead, we can assume his decision making is generally sound but that his valuations of specific outcomes are not solely based on their dollar payoff.  This allows us to attack his sub-optimal valuations in a way that he cannot counter.

Advantages of Unexploitable Exploitation (UE)


Besides the obvious advantage that our opponent will have a very difficult time counter exploiting us relative to maximally exploitative play, there are a number of other reasons to prefer the approach outlined above.

First, unlike with maximally exploitative play, your responses to leaks are smooth with the size of the leak. When playing this game maximally exploitatively, even if your opponent calls 50.00000001% you must bet your Qs 0%.  And if he calls 49.999999% you bet your Qs 100%.  

This discontinuity of strategies has a number of weaknesses.  It means that very minor errors in your estimate of the nature of your opponent's strategy can cause you to select the absolute wrong counter strategy. It also makes your exploitation extremely obvious to your opponent which makes them likely to counter it.  The combination of these problems means that if you misjudge your opponent's leak, or if they adapt to your exploitative play, the amount they can win off of you in a very short time can be far, far greater than the amount you stand to win by maximally exploiting them.

Second, UE strategies are much more practical to figure out and execute.  Computing a maximally exploitative strategy requires a function that takes in your opponents entire strategy and returns a counter strategy. To actually compute this you need to know exactly what your opponent does with every possible hand in every possible situation.

Compare that to UE strategies can be computed just by altering payoffs. You don't need to know exactly what the person does in every possible situation with every possible hand, you just need to know how the value certain outcomes, which is enormously simpler.

I envision this functionality being built into GTORangeBuilder in the future, with sliders for various types of common leaks that would allow you to quickly profile players and calculate UE strategies against them in various situations.

Conclusions


UE offers a middle ground between GTO and maximally exploitative play that is still based on a fundamentally rigorous mathematical approach using the same underlying notion of a Nash Equilibrium that GTO poker is based upon.  It is more flexible, less exploitable, and easier to calculate than maximally exploitative play, and it is able to profit from opponent's weaknesses in ways that pure GTO strategies cannot.

In the future, I'll do a part 2 to this post where I analyze some specific poker scenarios and consider UE play in an actual hand, but hopefully the bluffing game example is enough to give you the basic idea of the concept.  As always feel free to post questions in the comments.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.