Tuesday, April 22, 2014

GTO Brainteaser #3 -- Blind Man's Bluff Solution

In The Mathematics of Poker (TMP), they walk through solving what they call the AKQ game, which is also known as Kuhn Poker.  Kuhn poker was invented decades ago by a famous American mathematician from Princeton who was a friend of John Nash (inventor of Nash Equilibrium and basis for the movie A Beautiful Mind).  It is one of the simplest games I am aware of that directly teaches fundamental poker concepts that most experienced players use on a daily basis.  A three player version of the game, with a 4 card deck is going to be one of the events in this years ACPC (annual computer poker competition), which I am planning to enter.  I'll discuss that more in a future post.

Mapping to the AKQ Game


The AKQ game uses a 3 card deck with an Ace, a King, and a Queen.  Both players must ante, and then each player is a dealt a single card.  There is one round of betting, and the high card wins.  Many variations of this game are solved in TMP, and the solution to the specific version of the game that is relevant to this weeks brainteaser is solved in the video below.


 


To solve the three card Blind Man's Bluff game (3CBMB) we just need to notice that you can imagine your opponents hand as your own, flip its ranking, and map it directly to the AKQ game.  For example, if you see that you have an A in the AKQ game, that is equivalent to seeing that your opponent has a Q in 3CBMB, and if you behave the same in both situations, you will achieve the same results.

To prove this need to show two things are true when this mapping is applied:
  1. For each possible deal of hands, the outcome when those two hands to go to showdown is the same in both games.
  2. For each player and for each card that he might be dealt, the information he has about his opponent's hand is the same.
For #1, see the chart below:



For #2, note that in both games each player knows which card one of the players has, and can only infer that the other player must have one of the two other cards with 50% probability each.  So if someone playing 3CBMB sees that his opponent has an A, he can only infer that he himself must have a K or a Q with even odds.  Under our mapping this is the same as your opponent in the AKQ game having either a K or an A with equal probability, which is what a player seeing that he himself has a Q (the equivalent of a 3CBMB A) would infer.  By symmetry this applies to all 3 cards in both games.

The Solution


Applying the AKQ game solution to 3CBMB is simple now that we know the games are equivalent.  Player 1 should always check first, then if Player 2 bets, he should always call if his opponent is showing a Q and call with probability 1/7 when he sees a King.  Player 2, when checked to, always bets when his opponent is showing a Q and bets with probability 3/7 when his opponent is showing an A.

The Bar Game


I got multiple requests for how to turn this into a bar game so that we can use our game theory to win free drinks.  This is course the ultimate goal of all of game theory so I put a fair amount of thought into it.  The issue with the game as I stated it is that the GTO strategy doesn't profit against weak players unless they do something really dumb, (eg fold when you are showing a Q).  The in-position player will win on average but always demanding to be in position sounds sketchy.  

I think I have a pretty reasonable idea to slightly tweak the game so that we can at least break even when OOP and so that we can profit more when our opponent plays sub-optimally.  I'll do a little bit more work on it and will make it a topic of a future post :)

Friday, April 18, 2014

GTO Brainteaser #3 -- Blind Man's Bluff

This weeks brainteaser involves solving a simplified, three card version of Blind Man's Bluff poker for two players. If you've never played Blind Man's Bluff before, you can play a full version of the 52 card game against our Arnold GTO bot below (note that his edge against sub-optimal play is very small).

The simplified version of the game game works as follows:

  1. There is a 3 card deck with an Ace, a King, and a Queen.  
  2. Both players must ante $50.
  3. Each player is dealt a card from the deck, face down, and is not allowed to look at their own card.  
  4. Each player then gets to see the other player's card (you can imagine them putting their cards on their foreheads).
  5. Player 1 may then either bet $75 or check.  
  6. Player 2 can call or fold if Player 1 bets and can check or bet $75 if Player 1 checks.
  7. If the hand goes to showdown (a bet is called or both players check) then the player with the higher card wins the pot.
What is the optimal strategy in this game?

As usual, I'll post a solution next week, but I'm going to shift to posting the solutions on Tuesdays so that I'm not scrambling to write them up over the weekend as often :)  If you haven't already, be sure to check out last weeks solution to the Red or Black game.



Monday, April 14, 2014

GTO Brainteaser #2 -- Red or Black Solution

Today we'll take a look at the solution to our second brainteaser, Red or Black.  The solution uses the concept of backwards induction which is a core part of game theory.  The proof of our answer also uses a related concept, mathematical induction.

While the solution to the problem on its own is very simple, without a thorough understanding of the two induction concepts above it can seem like magic, so I'm going to start by explaining backwards induction and mathematical induction with simple examples and then I'll show the solution at the end.  If you just want to know the answer you can of course scroll down :)


Mathematical Induction



Mathematical induction is a very useful technique for solving problems that are relatively simple when the numbers involved are small, but that get very complex as one of those numbers grows.  Lets call the number that is going to grow N.  Induction is a 3 step process:

  1. Show that some statement or formula is true for some base case N = y where y is small.
  2. Assume that that statement is true for an arbitrary number N = x
  3. Based on that assumption show that it must be true for N = x + 1.
You then conclude that your statement or formula is true for any natural number greater than or equal to y.

The basic idea is that if you know it's true for a base case, N = y, then assumption #2 is correct when x = y, so by #3 the statement is true for y + 1.  But by applying #2 and #3 above again, if it is true for y + 1, then it is true for y + 2, and if it is true for y + 2 then it is true for y + 3, and so on.  Thus it is true for any N greater than y.

This technique can often make problems that seem extremely difficult to prove very simple.  Let's look at a simple example, suppose I want to prove that:
11n - 4n is divisible by 7
for any natural number, n that is greater than or equal to one 1.  At first glance, that is not at all obvious.  I can check it for a few values of n easily and see that it seems true, but to actually prove it for all n seems hard.  With induction it's easy.

  1. Base case: N = 1.  111 - 41 = 7 and is divisible by 7
  2. Now assume that for N = x,  11x - 4x is divisible by 7.
  3. For N = x + 1, 11x + 1 - 4x + 1 = 11 * (11x - 4x) + 7 * 4x.  If 11x - 4x is divisible by 7 then 11 times that plus a multiple of 7 is also divisible by 7.
  4. So the statement is true for all natural numbers n great than or equal to 1.


Backwards Induction




In game theory, backwards Induction involves imagining yourself at each possible final decision node and working backwards.  Once you know the optimal action for each final node, you can use that information to determine the second-to-last action, and so forth, eventually arriving at complete solution for every possible decision node.  The process works as follows:


  1. For each final decision node figure out the optimal decision for any players involved and the payoffs for all players assuming that they all make optimal decisions.
  2. Remove those final decision nodes from the decision tree and replace them with the payoffs you get by assuming optimal play from that point forward.
  3. Step 2 will make a whole new set of nodes final decision nodes.  Repeat the process for those nodes until there are no remaining decision nodes.


That sounds a bit complicated but it is actually quite simple and it is something you do every day at the poker table.  Let's look at a simple example from a widely studied game called the centipede game, so named because its decision tree looks like a centipede.

The basic idea of the game is simple.  There are two players, a single prize, and 10 rounds.  In round 1, the prize is $1 and Player 1 can choose to keep that $1 prize (giving his opponent nothing), which ends the game, or to pass, which then doubles the prize.  If Player 1 passes, Player 2 is then faced with the same decision for the $2 prize in round 2.  If the prize goes unclaimed in round 10 neither play gets anything.  The question is what is the optimal strategy?  What if instead of 10 rounds there were 100?

This problem is easy to solve with backwards induction.

We first need to figure out the optimal play and expected values at the final decision node. In round 10, the optimal strategy for player 2 is to claim the prize and get the maximum payout of $512 so he always will do so and player 1 will get $0.



Now we can remove round 10 from the decision tree and which makes round 9 a final decision node with direct payoffs.  In round 9 Player 1 now has a choice between taking $256 and giving Player 2 $0 or passing and getting $0 himself, so he of course should take the $256.  Repeating this process all the way back gives that the optimal strategy is for Player 1 to take the prize and end the game in round 1.  You can see that this logic holds no matter how many rounds the game has or how fast the prize grows.


Red or Black Solution



Armed with these tools let's take a crack at the actual Red or Black problem.

Let's start with backwards induction.  If we found ourselves in a state where there are only 2 cards remaining, what is the best we can do?

  1. If there are 0 red cards left, then we are guaranteed to lose.  
  2. If there are 2 red cards left we are guaranteed to win.  
  3. If there is 1 red card left and 1 black card left we have 2 options.  We can either bet now, and win 50% of the time, or we can wait until the last card.  If we wait until the last card we will be forced to bet, and it will be red 50% of the time and black 50% of the time.

Looking at the above we can see that with 2 cards, the probability of us winning is always the number of red cards, R divided by the number of total cards, N.  This doesn't prove anything about the 52 card game yet, but it gives us a base case from which to induct.

As shown in our top induction example, our next step is to assume that if we are in a state with N total cards left, R of which are red, then our optimal strategy will have an expected value of R / N.  If by assuming this, we can prove that with N+1 cards for any R your expect payoff is R / (N + 1), that would then let us backwards induct through the game tree and collapse the payoff for waiting at any decision node to a direct payoff of R / N.

So let's assume that for all possible numbers of red cards R, with a total of N cards, the probability of winning with optimal play is R / N.

So now imagine we in a state with N + 1 total cards, and some specific number of red cards, R = r.  Let's determine the payoff of our optimal strategy, using our assumption above.  If we bet immediately our payoff is r / (N + 1).

If we wait and see a card, we will end up in one of two states:

  1. A red card will be dealt, leaving us with r - 1 red cards and N total cards. The probability of a red card being deal is r / (N + 1).
  2. A black card will be dealt, leaving us with r red cards and N total cards.  The probability of a black card being dealt is (N + 1 - r) / (N + 1).

In either case we are in a state with N total cards.  Applying our assumption that the probability of winning with optimal play in a state with N total cards is R/N where R is the number of red cards, we can conclude that the probability of winning with optimal play in state 1 is r - 1 / N and in state 2 is r / N.

To compute the overall probability of winning if we wait and see a card, we just need to multiply these two probabilities by the frequency with which they occur.

(r - 1) / N * (r / N + 1) + (r / N) * (N + 1 - r) / (N + 1)  =
rN / (N * (N + 1)) =
r / (N + 1)


So by induction our odds of winning with optimal play in a deck with any number of cards is just the number of red cards divided by the total number of cards.

Thus with a standard deck, our odds of winning with optimal play are 26 / 52 = 1/2.  And if we win $100 when we win and lose $100 when we lose then the EV of the game is 0.  We can easily see that just betting on the first card gives us an EV of $0, so that strategy is co-optimal.


An Alternate Solution


The goal of this problem was to give a nice introduction into the key game theory concepts of induction and backwards induction, but I definitely want to mention that reddit user Leet_Noob posted an very elegant solution to the problem that I was unaware of that requires neither.  

In fact, it requires no mathematical calculations whatsoever and you can read it here. However, the logic he uses is very specific to this particular problem and won't teach you techniques that are widely applicable to other game theory problems so I'm not going to focus on it here.  Furthermore, the type of argument it uses is extremely easy to misapply if you are not quite experienced with probability, so it actually requires some careful thought to verify that his solution is correct. 

Friday, April 11, 2014

GTO Brainteaser #2 -- Red or Black

This weeks brainteaser is about a very simple card game that works as follows:

You have a standard 52 card deck that is properly shuffled and stacked with all cards face down.  The dealer is going to flip over one card at a time.  Before the dealer flips each card, you can place a $100 bet. If the next card is red (a heart or a diamond) you win $100.  If the next card is black (a club or a spade), you lose $100.  Win or lose, the game ends at the point, and the deck is reset and reshuffled.  If you don't bet before the second to last card is flipped over, then you are required to bet on the last card.

What is the optimal strategy for this game?  How much will you win on average if you play optimally?

This game doesn't relate directly to poker, but to solve it you need to understand backwards induction which is a key game theory concept that is needed to solve almost any real world poker situation.

Full Disclosure: I did not invent this brainteaser but despite a fair amount of googling I can't track down who did and it got passed on to me via word of mouth years ago.  If anyone knows the creator of this puzzle please let me know as I'd love to cite him or her.

This problem is being discussed on reddit here if you want to join the discussion.  And if you missed out on last weeks brainteaser be sure to check it out here.  Update, its Monday and the solution has been posted here.

You can play this game (for pretend money of course) below if you are using an up to date browser:


Monday, April 7, 2014

GTO Brainteaser #1 Solution

Today I'll be going over the solution to the Rock Paper Scissors brain teaser that I posted here: http://blog.gtorangebuilder.com/2014/04/gto-brain-teaser-1-exploitation-and.html and talking a bit about how it relates to poker.

The main problem is solved in the video below, which also has a nice introduction to basic game theory concepts.


 


How is this relevant to poker?



At its essence, this brainteaser is about how you deal with cases where your opponent is in a bad situation, but they also know that they are in a bad situation, and they know that you know, and so on. These types of subgames come up all the time in poker, for many reasons.   Some examples, you 3bet pre-flop and a flop that is clearly very bad for your pre-flop range hits, or you are late in a tournament and have a stack size that severely limits your options, etc.

A common mistake is to try and make too much of your situational advantage in a way that is easily countered, rather than taking a moderately exploitative line that leverages your inherent situational advantage to profit, in way that cannot be countered.

A simple example: say you are on the river versus someone who has taken a line that they would never take with a super strong hand, or air.  They have basically turned their cards face up as a strong, one pair hand. Meanwhile, you've been representing either two pair plus or a flush draw, and on the river the flush draw misses. However you are sure that your opponent cannot possibly have a big enough hand to feel great about calling off a shove. Let's say your range is about half air and half bluffs and there is a pot sized bet left in your stack.

As I show in this post, you are at an inherent advantage in this situation and should be able to win about 75% of the pot on average even though your range only has about 50% equity when both players play perfectly. However, the one thing you can do to throw away that advantage is to think, "he never has a strong enough hand to call here, I should always bluff to punish him for building such a big pot with a medium strength hand on the flop and turn."

This is equivalent to trying to play 100% paper to take advantage of your opponent in the rock paper scissors game and is just as easily countered by your opponent always calling.  By over-reacting to your opponents situational disadvantage, you can put him back on even footing.  If you instead take a measured response and bet the nuts always, and you're air 50%, (such that you have the nuts 2/3rd and air 1/3 when your opponent calls) then you are guaranteed a nice solid profit no matter what your opponent does simply due to the good situation you have gotten yourself into.


Population Dynamics



While the above is a nice story, it is not what made me think of the problem.  What actually got me thinking about this issue was something much more specific, playing heads up on sites that use anonymous player names.

Often fishy players sit down at heads up and play super aggro on the first hand (how true this is probably depends on what stakes you play).  They're looking for action and they came into the game telling themselves how they're going to be aggressive today and punish their nit opponents.

As a result, when a fishy player 4-bets you on the first hand, especially if they sit down with some weird amount that looks like their entire bankroll, you end up in a situation where you generally want to stack off (go all-in) much lighter (with a worse hand) than you would normally.

However, with anonymous player names, the person who's sitting at the table waiting for an opponent has no way to know if a new player who just sits down is actually a maniac fish, or a strong regular.  The person joining the table however, can much more safely assume that anyone sitting at an open table waiting for action is more likely a reg than a fish, particularly if there are a number of open tables each with one player seated.

Thus if you join a table against another reg, and get a big hand on the first hand, you can mimic the play of a fish (be super aggro and use slightly odd bet-sizing) and often induce a light stack off from your opponent.

This actually maps exactly to our Rock Paper Scissors example.

Take the rock paper scissors game above (the main problem, not the bonus), but rather than thinking about it from the perspective of Player 2 as a single person, think about it as follows.  Player 1 is a GTO  player waiting for an opponent.  Player 2 will be chosen randomly, and 50% of the time he will be a fish who always plays rock (call him P2F), and the other 50% of the time he will be a smart opponent (call him P2S). Player 1 can't tell who he is playing against, and both Player 1 and P2S are aware of the population dynamics.

While Player 1 wins money overall, he actually loses money to P2S, who is playing 2/3 scissors, 1/3 paper against 2/3 paper, 1/3 rock.  P2S actually averages $33.33 per round in this game.   Player 1 basically takes money from the fish, but gives half of it back to P2S.

Furthermore, the existence of P2S is what protects P2F from being exploited as badly as he should.

Population effects are common in Game Theory.  They come up whenever you introduce a subset of weak players into a pool of GTO players and then have the GTO players adjust their strategies based on the assumption that they are randomly being paired against a member of the population.  This results in a shift of the GTO players' strategies, from the regular equilibrium strategy to something that is moderately exploitative (like our 2/3 paper, 1/3 rock strategy from the brainteaser).

This is an interesting aspect of game theory that definitely has a number of implications in every day poker. Next weeks brainteaser will investigate it in more depth as we look at a problem that is related to this year's CPC (Computer Poker Championship) competition.

Bonus Solution


I'm going to be using a lot of the terminology and techniques that I introduced in the solution video above, so if you haven't watched it already please check it out here before continuing.

To solve the bonus problem we have to start at the end of the game tree and work our way back to the start via Backwards Induction.

The basic idea of backwards induction is simple, we solve the 2nd round, and then solve the first round by assuming that the payoff for going into the second round in various game states is the equilibrium payoff of the 2nd round in isolation.  An important and related concept is the notion of a subgame perfect equalibrium.

In this case the 2nd round is very easy to solve.  If our opponent did not play rock in the 1st round then they must play it in the 2nd round, and this is known to both players, so they are guaranteed to lose.  If they did play rock in the 1st round, then the second round is just a regular round of rock paper scissors, and the equilibrium payoff in the round is 0 for both players (In the nash equilibrium, both players each play 1/3 rock, 1/3 paper, 1/3 scissors) .

We'll also assume that Player 1 should be able to profit from the game.

We can rule out any pure strategy play for Player 1 by noting that if Player 1 plays a pure strategy in round 1, then his opponents best response will guarantee that he loses in round 1, which means at best he will break even overall.

We can also say that any pure strategy for Player 2 would at best earn -$50 per round, which is what they get for always playing rock in round 1 and then playing the nash equilibrium of the regular rock paper scissors sub game in round 2.

So assuming player 2 can average better than -$50 by mixing, both players should play mixed strategies.

We can write out the expected value of our options very quite simply.

EV[P1] = 50 (r2 - s2) + 50 *  (1 - r2)
EV[S1] = 50 (p2 - r2) + 50 * (1 - r2)
EV[R1] = 50 (s2 - p2) + 50 *  (1 - r2)

Here r2, p2, and s2 are the probabilities with which Player 2 plays rock, paper and scissors respectively, and EV[P1] is the expected value of player 1 playing paper.

We can note immediately that if we subtract the second round EVs, which are represented by the 50 * (1 - r2) term from each equation, this makes the equations symmetric in r2, p2,  s2. Thus the solution to the indifference equations must be symmetric.

If we note that r2 + p2 +  s2  = 1, the only symmetric solution is that they are all 1/3.

So Player 2 plays rock, paper and scissors each 1/3.  If we plug those values in to Player 1's EV equations we see that he profits by $33.33 per round.

For player 2, if we note that playing anything other than rock loses us $50 in round 2, our EV equations are:

EV[P2] = 50 (r1 - s1) - 50
EV[S2] = 50 (p1 - r1) - 50
EV[R2] = 50 (s1 - p1) - 0

We know that Player 2 is playing all 3 options so indifference conditions require the EVs are all the same. Furthermore, we know that each of those EVs = $-33.33 since the game is zero-sum.

You can solve these by hand or use something like wolfram alpha, but its easy to check that r1 = 1/3, p1 = 2/3,  s1 = 0 solves all 3 equations properly (it makes them all equal to -$33.33).

This means that Player 1 plays the same strategy in both the main problem and the bonus problem, but that Player 2's disadvantage is twice as large in the bonus.

Edit:

Note that because P1 is never actually playing scissors, the indifference condition on EV[s1] need not actually hold.  Because of this there can be multiple equilibrium solutions to this game so long as EV[P1] = EV[R1] as h7r so helpfully noted in the comments.

The EV for both players is the same regardless of which equilibrium is played because that is true for all zero sum two player games.








Friday, April 4, 2014

GTO Brain Teaser #1: Exploitation and Counter-Exploitation in Rock Paper Scissors



This weeks brainteaser involves a modified version of Rock Paper Scissors (RPS) that has some interesting implications for poker strategy, and the general concept of exploiting opponents by taking advantage of an opponents weaknesses.

This game is a "toy game" in that it is a simplified model game that we can study to gain some intuition into the way that bigger games (like poker) work.

Modified RPS


Consider the standard game of Rock Paper Scissors with the following twist.  At the start of each round an independent judge flips a fair coin and tells your opponent the result but does not tell you.

If the coin came up heads your opponent must play rock.  Otherwise he can play whatever he wants.  You can always play whatever you want and standard RPS rules apply (paper beats rock, which beats scissors, which beats paper).  Your opponent is a smart thinking player and will adapt perfectly to whatever strategy you play.

We're going to look at the following two questions.

  1. If the loser of the game must pay the winner $100, what is the most you should be willing pay to play it?  In the event of a tie no money is exchanged.
  2. What is the GTO strategy for both players?

Maximally exploitative play doesn't work


A naive approach would be to note that whatever our opponents strategy, he will be playing rock at least half the time.  The Maximally Exploitative strategy against anyone in RPS is to always play the thing that beats what our opponent plays the most.

So in this case we would always play paper.  However, a clever opponent might expect this and would always play scissors when he was not forced to play rock by the coin flip.

This would result in us winning when the coin was heads with paper vs rock, and losing when the coin was tails with paper versus scissors for an average of $0.  Despite our opponent's handicap this strategy fails to profit at all.

Is there any way to actually profit on average against a smart opponent in this game?  It should be easy, he's at a huge disadvantage!

Bonus


This problem is a bit harder, but also of interest.

You are going to play two rounds of rock paper scissors against the same opponent.  This time, there is no coin flip, but instead, the rule is that your opponent must play rock in at least one of the two rounds.  The loser of each round must pay the winner $50.  What is the most you should be willing to pay to play the game and what is the optimal strategy for both players?

Solution


You can see an in-depth solution to both this problem and the bonus  here: http://blog.gtorangebuilder.com/2014/04/gto-brainteaser-1-solution.html

Wednesday, April 2, 2014

Unexploitably Exploitative: How to (Out)Think Like a Fish

When poker players discuss game theory, two fundamental styles of play are generally considered, exploitative play and GTO play.  When playing exploitatively, you identify a known leak of your opponent and then make decisions that extract the maximum possible value of your opponent given that leak.  When playing GTO you ignore your opponent's strategy completely and you make perfectly balanced decisions that prevent a perfect opponent from exploiting you.

In practice, both of these approaches have significant drawbacks.  Even fish are rarely bad enough to not catch on and adapt to maximally exploitative play, and determining maximally exploitative play requires knowing every detail of your opponent's strategy which is practically impossible in real life.  While GTO play performs well against fish as I show in this post, it will generally leave money on the table by ignoring opportunities to attack our opponent's specific known weaknesses.

In this post I'm going to discuss the concept of Unexploitably Exploitative (UE) strategies as an in-between option that retains most of the desirable properties of GTO play, while still allowing us to systematically attack specific leaks in our opponent's play.  I'm in the process of adding the ability to calculate UE strategies to GTORangeBuilder as the techniques for actually computing these strategies are identical to those for solving for GTO play.

Game Theory and Imperfect Play


When playing an actual poker session, most people have a generally solid strategy that they start with (ideally this would be GTO).  Then as they identify weakness in their opponents, they make relatively small adjustments that take advantage of these weaknesses, rather than transforming their entire strategy into a maximally exploitative line.

The challenge we face is: how can we mathematically identify the best ways to make these small adjustments and also know how big the adjustments should be?

People often seem to think that game theory has nothing to say about games against imperfect opponents; that is games where some or all decision makers sometimes make suboptimal choices, but in actuality, there are entire fields such as experimental economics, and behavioral economics that focus on exactly these problems.  There are also known mathematical techniques for applying traditional equilibrium concepts to imperfect opponents.

Today I'm going to focus on the latter approach and talk about a concept known as "tilting", in the Game Theory sense, not in the Poker sense :)  We'll apply the traditional Nash Equilibrium solution concepts to an adjusted ("tilted") version of the game of poker that incorporates our opponent's weaknesses into the game by altering its payoffs.  This will allow us to find strategies that our weak opponents can only adjust to by changing the fundamental way in which they are weak.  I call these unexploitably exploitative strategies.  The strategy is unexploitable in the adjusted game because it is a Nash Equilibrium, but is exploitative in the real game and thus will extract EV from our opponent's weaknesses.

The only way to really understand this concept is to see it in practice, so let's start with a very simple example of the bluffing game.

The Bluffing Game


Consider the following extremely simple example of a poker situation.  The game starts on the river with two players and a pot of 100 chips.  Player 1 must act first and can either bet 100 chips or check.  If Player 1 bets, Player 2 can call the 100 chip bet or fold.  If Player 1 checks, Player 2 can bet 100 chips or check.

The board is 2c2s2h3c3s

Player 1's range contains AcAh and QcQh each with equal weight.
Player 2's range contains only KcKh.

Both players' ranges have 50% equity, but player 1's range is nuts or air, whereas player 2's range is a single hand right in the middle.

The Nash Equilibrium of this game is very simple.  Player 1 bets with AA always and with QQ half of the time.  Player 2 calls with KK half of the time, and checks back when Player 1 checks.

You can see the solution here:


Note that the game significantly favors the player with the nuts or air range.  Player 1's EV in the game is 75 chips while player 2's is 25 chips, when both players play perfectly.

Now let's suppose you were Player 1 playing this game 50 times in a row against a slightly fishy Player 2. You've watched him play the same game against a previous opponent and noticed that he calls with KK 55% of the time, and any time he folds he grumbles about how you probably had a Q.  When he calls and you do have a Q, he complains about you trying to bluff at him.  The fishy player hates feeling like he is getting bluffed off the best hand and thus he can't help himself from calling a bit more often than is optimal.

GTO play does not earn any additional profit against this fishy opponent.  With our aces we get (100 * .45 + 200 * 0.55) / 2 = 155.  With our queens we get 0 * .5 + .5 * (.55 * - 100 + .45 * 100) = - 5.  So if we average 155 and -5 we get 75, the same as the equilibrium EV of the game.

Maximally exploitative play says we should bluff 0% of the time versus our opponent, and thus only bet with AA.  This will increase our EV from 75 chips to 77.5 chips per round.

So say we decide to play maximally exploitatively.  From our opponents perspective, rather than seeing us bet 75% of the time he will see us betting 50% of the time.  Remember, we're not assuming our opponent is a complete idiot, we're just assuming that he has a specific ego driven leak where he hates to feel like he's being bluffed out of pots.  As soon as he suspects that we hardly ever bluffing, he is likely to adapt to our play.

Imagine that he is very slow to catch on but after 45 of the 50 rounds, he's become quite sure that we are very rarely bluffing.  In reality 45 is a big enough sample that it would be nearly impossible for our opponent not to be certain that we were betting much less than 75% of the time, given that we had actually only been betting our Aces.

Over those 45 rounds of exploiting our opponent, our EV gain relative to GTO was 2.5 chips per round, for a total of 112.5.  Say our opponent switches to an always fold strategy just for the final 5 rounds. Statistically it would be very difficult for us to detect this and adjust, because the odds of randomly folding the first 4 times in a row while playing a fold 50% strategy are 1/16.  That is, over a 50 hand sample we'd expect to see 4 folds in a row quite frequently.

In those last 5 rounds our opponent folds to our AA and wins showdowns against our QQ every time, upping his EV from 25 chips to 50 chips per round..  This means our opponent nets an EV of 125 by counter-exploiting us for 5 rounds.

Our total EV over 45 rounds of us exploiting our opponent and 5 rounds of him counter exploiting us is actually -12.5 chips relative to the GTO vs GTO EVs.  That's worse than playing GTO the whole game, even when we correctly identify our opponents leak, play maximally exploitatively against it, and have an opponent who is very slow to react to our exploitative play.

Unexploitable Exploitation


The basic problem with the maximally exploitative play above is that it doesn't try to exploit a reasonable underlying cause of our opponents' leak.  We don’t necessarily need to know the exact underlying cause, but we can assume that the fish is not an idiot or completely insane.  Thus we can't just expect him to completely ignore our betting frequency when we are obviously playing maximally exploitative, completely unbalanced strategies.  That is both underestimating our opponent and poorly identifying the root cause of his leak.

Rather than assuming our opponent is completely oblivious to our play, we'll assume that he is an otherwise generally solid player who fundamentally hates getting bluffed out of pots.  It makes him feel weak, and wondering if we were bluffing is stressful, so he hits the call button whenever it seems close, even if he has a feeling it might be slightly -EV.

To attack this opponent intelligently we need to actually build his weakness into the structure of the game, which is actually very easy to do.

In Game Theory and economics rather than considering peoples payoffs as money, they are described as utility.  Utility is the overall measure of happiness that one gets from specific outcomes.  A perfect poker player would be able to win the maximum number of dollars by having his utility function exactly equal to the expected value of the number of dollars the he won or lost with every decision.

However, in real life, almost no poker player actually achieves the type of utility function.  Some of us love or hate variance, some of us hate looking stupid and will avoid a call that might make us look dumb if we feel the call is very close to 0 EV.  Others love running a big bluff, or love feeling smart when they make a very tricky thin value bet and get called.  Other leaks can be more tricky, some people experience a bigger emotional difference between a -$50 session and a +$50 session than they do between a +$500 session and a + $600 session.  Or they feel better about themselves when their non-showdown winnings are positive.

To the extent that these feelings impact our decision making, they are all leaks that decrease our dollar EV, but there is absolutely nothing that prevents us from incorporating these concepts into game theory and solving for equilibrium of that new game.

In the regular bluffing game, the EVs for each player are as follows, I ignore player 1 checking Aces or player 2 betting when checked to as both are clearly weak plays.

For Player 1

EV[Bet Queens] = pC * -100 + (1 - pC) * 100
EV[Check Queens] = 0

For Player 2

EV[Call with Kings] = pQ * 200 - pA 100
EV[Fold with Kings] = 0

Where pC is the probability that player 2 calls a bit with a King and pQ and pA are the probabilities that player 1 bets his Aces.

At equilibrium player 1 must be indifferent between betting and checking his Queens and note that clearly pA must be 1, so it is easy to see that pQ must be 1/2.  Similarly since player 2 must be indifferent between calling and folding with his Kings and pC must be 1/2.  Let's now imagine getting inside the fish's head and changing the actual game into the game that he imagines he is playing given his utility function.

We can think of it as if, in his mind, folding when you bet with Queens is worse than 0 EV.  Symmetrically, in his mind, he believes that when you, Player 1, make someone else fold to your Queens, you feel better than 100 EV by the same margin.  Let's call that margin X.

Note that we don't need to be exactly correct on his internal motivations and psychology for this technique to work.  As long as he is playing as if his utility function is adjusted by X as described, the actual psychology behind it is irrelevant.  Psychological insights are only needed to the extent that they can help us identify our opponent's utility function and these insights can always be made statistically based on passed observations rather than based on psychology if need be.

If we insert X into our equations we get:

For Player 1

EV[Bet Q] = pC * -100 + (1 - pC) * (100 + X)
EV[Check Q] = 0

For Player 2

EV[Call with K] = pQ * 200 - pA 100
EV[Fold with K] = -pQ * X

If we again imply indifference conditions we get that 

pC = (100 + X ) / (200 - X)
pQ = 100 / (200 + X)

Now we observed that this player calls 55% so we can impute the value of X as 6.45.  If X is 6.45 then the optimal pQ is approximately 48%.  These strategies represent optimal play in the game that the fish's sub-optimal preferences create.  The fish cannot improve his EV by changing his strategy without fundamentally changing his nature by having a different value of X.

Let's now step away from the adjusted game with the fish's utility function and step back into the actual game with dollar payoffs.  If we were to play these optimal strategies in the real game (where X is 0) what would our EV be?

For Player 1, it would be the same when we have Aces, 155.  Then when we have Queens we get 0 * .52 + .48 * (.55 * - 100 + .45 * 100) = -4.8 for a net EV of  0.1bb above GTO per hand (or 10bb/100).

While that isn't nearly as good a win-rate as the maximally exploitative strategy, it is a win-rate that our opponent cannot possibly prevent us from achieving unless he actually changes his entire way of thinking by reducing his internal value of X.

Thus our strategy is unexploitably exploitative.  It is exploiting our fishy opponent in the actual game, based on the nature of his specific weakness, but in a way that is completely unexploitable (because it in a Nash Equilibrium) in the version of the game where the fish is paid based on his true utility function.

By building his fishy nature into the payoffs of the game, we don't have to assume that he is an idiot, or that he is paying zero attention to our play and will never adapt.  Instead, we can assume his decision making is generally sound but that his valuations of specific outcomes are not solely based on their dollar payoff.  This allows us to attack his sub-optimal valuations in a way that he cannot counter.

Advantages of Unexploitable Exploitation (UE)


Besides the obvious advantage that our opponent will have a very difficult time counter exploiting us relative to maximally exploitative play, there are a number of other reasons to prefer the approach outlined above.

First, unlike with maximally exploitative play, your responses to leaks are smooth with the size of the leak. When playing this game maximally exploitatively, even if your opponent calls 50.00000001% you must bet your Qs 0%.  And if he calls 49.999999% you bet your Qs 100%.  

This discontinuity of strategies has a number of weaknesses.  It means that very minor errors in your estimate of the nature of your opponent's strategy can cause you to select the absolute wrong counter strategy. It also makes your exploitation extremely obvious to your opponent which makes them likely to counter it.  The combination of these problems means that if you misjudge your opponent's leak, or if they adapt to your exploitative play, the amount they can win off of you in a very short time can be far, far greater than the amount you stand to win by maximally exploiting them.

Second, UE strategies are much more practical to figure out and execute.  Computing a maximally exploitative strategy requires a function that takes in your opponents entire strategy and returns a counter strategy. To actually compute this you need to know exactly what your opponent does with every possible hand in every possible situation.

Compare that to UE strategies can be computed just by altering payoffs. You don't need to know exactly what the person does in every possible situation with every possible hand, you just need to know how the value certain outcomes, which is enormously simpler.

I envision this functionality being built into GTORangeBuilder in the future, with sliders for various types of common leaks that would allow you to quickly profile players and calculate UE strategies against them in various situations.

Conclusions


UE offers a middle ground between GTO and maximally exploitative play that is still based on a fundamentally rigorous mathematical approach using the same underlying notion of a Nash Equilibrium that GTO poker is based upon.  It is more flexible, less exploitable, and easier to calculate than maximally exploitative play, and it is able to profit from opponent's weaknesses in ways that pure GTO strategies cannot.

In the future, I'll do a part 2 to this post where I analyze some specific poker scenarios and consider UE play in an actual hand, but hopefully the bluffing game example is enough to give you the basic idea of the concept.  As always feel free to post questions in the comments.