GTORangeBuilder Blog: 2014

Saturday, December 13, 2014

GTO Brainteaser #9 Solution -- Multistreet Theory: Range Building with Draws

I'm going to present the solution to GTO Brainteaser #9 without restating the problem, so if you are not familiar with the problem statement please read the brainteaser before continuing on to the solution. This problem, with 160 chip stacks, is solved in depth in my latest CardRunners video so for a detailed solution definitely check out the video. The 160 chip stack version is a bit more interesting for those of you who are curious.

This game turns out to have concrete testable predictions about how we should optimize our turn play in real world scenarios which I go through in detail in the upcoming part 3 of my CardRunners series on multi-street theory.

Computational Solution

Of course GTORangeBuilder can just solve this brainteaser numerically for us. I will go through a pencil and paper solution as well to illustrate the concepts and mathematics required to solve these types of problems, but since the pencil and paper solution is reasonably complex I'll give a solution summary and a browseable GTORB solution before I get into the nitty gritty of the analytic solution.

I'm only going to provide an analytic solution of the game where we do not pay the villain $3 to check so that we can see that it is actually optimal for him to lead for $50 against us. Calculating GTO strategies for the variations of the game (where our opponent is forced to check) can be done using the exact same techniques.

Solution Summary

It is +EV to pay your opponent $3 to check, optimal play for the villain is to always lead the turn for half pot and this significantly increases his EV
This game is more than double the EV for the villain than the game where the hero's range is half nuts, half air, even though his turn equity is only a few % higher.
The hero's EV is higher here when he holds 1/3rd straight draws and 1/3rd flush draws than when he holds 2/3rds flush draws and no straight draws, even though flush draws are stronger hands.
It is optimal for us to jam our strongest draws some of the time and call the rest. 150 chips is the exact break-even point where calling and folding a flush draw are equal EV, if stacks are any deeper it is +EV to shove/call a flush draw and we should mix between the two.

At first, the idea that the villain should bet his merged range into his opponents polarized range might be quite surprising. The reason for this is that while the villain is facing a somewhat polarized range on the turn, he is facing an even more polarized range on the river and by leading the turn he takes control of the pot size, rather than letting his opponent build the pot with a perfectly balanced range. It turns out that even if the villain were the check the turn, check jamming is higher EV than check calling.

In general, it often is GTO to get all the money in on the turn in situations where the river will put you at such an informational disadvantage that you won't be able to realize your equity.

The basic issue for the villain is that his informational disadvantage greatly increases on the river. This is due to the fact that on the turn the hero's straight draws and flush draws and roughly equivalent hands, so the informational advantage that the hero has by knowing his own hand, rather than just his range is minor, whereas on flush or straight runouts, these two hands become polarized, greatly increasing the value of knowing which specific hand of the two the hero holds.

Mathematical Solution

To solve multistreet games like this will need to use backwards inductive techniques like what we used in the solution to GTO Brainteaser #6.

In this case what we will do is start by considering each of the OOP villain players 3 options for his first action and assessing their EVs when both players play GTO from that point forward.

The villains simplest option is to shove. In this case optimal play for the hero is to call with the nuts and fold the rest of his range.

Since the hero has the nuts 1/3rd of the time and the villains equity vs the nuts is 0, from the villains perspective the EV of this option is 1/3 * -150 + 2/3 * 100 = 50/3.

Next lets consider what happens if the villain bets $50. The hero can react by calling, folding or shoving? How can he optimally split his range between these three options? Lets start by examining what the hero should do with the nuts as that drives the rest of the strategy.

Clearly folding the nuts is a bad plan so the question is, is our EV higher with a shove or with a call?

Against a shove, the villain must call enough to make us indifferent between shoving and the highest EV of calling or folding of our potential bluffs.

Since a flush draw simply has more outs than a straight draw, it should be the higher EV hand both as a shove and as a call and thus we can conclude the villain must call a shove such that we are indifferent between jamming our FD and calling it or folding it (whichever of those two is higher EV).

At this point we're going to guess that jamming the nuts is optimal, which is easily checked as a final step. Intuitively jamming the nuts seems like a good guess as intuitively villain should have to call a turn jam more than a river $100 bet, as our bluffs have equity on the turn so we can jam more air.

So given that we are jamming the nuts, over a $50 lead we just need to figure out how to optimally allocate or flush and straight draws between calling, folding and jamming.

Lets consider what kind of range of range it might make sense to call with. First intuitive we can note that if we are calling with some flush draws, we should always call with some straight draws also. Why?

Suppose we call with x% of our flush draws. If we then call with y% of our straight draws and y = x/3 we can jam our entire range whenever a flush completes and our opponent's EV will be 0 because he will be calling 100 to win 300 so we want 1 bluff for every 3 value bets. As long as our range doesn't have too high a frequnecy of straight draws, our straight draw effectively whenever a flush or a straight completes because we can bluff shove them whenever the flush completes.

Similarly, by having both flush draws and straight draws in our calling range we can bluff some of our flush draws whenever a straight completes, further increasing our EV. Having both of these diverse draws that hit on different runouts increases our EV with both drawing hands by ensuring that our range nicely polarized on a variety of river cards, they have a lot of synergy.

So what are the optimal relative frequencies for calling with our flush draws and straight draws? If y > x/3, we are no longer able to usefully turn our straight draws into bluffs so their value goes way down. Optimal play requires that y = x/3 exactly.

What is the EV of calling a river bet with a range where y = x/3, that is it is exactly 3/4 flush draws and 1/4 straight draws?

Given our opponents hand there are 46 possible river cards.

7 complete our flush
6 complete our straight
2 complete both
31 are blank

Of those 31 blanks, our range blocks 2 of them 3/4 of the time and 2 of them 1/4th of the time, and 27 of them never, so effectively given the blockers from our range, there are 27 + 2 * 1/4 + 2 * 3/4 = 29 blanks

When our flush completes or both draws complete, our EV is the entire pot (minus the $50 we contributed to it) since we will bet and our opponent will just fold, so 150.

When the straight completes, we will bet our straights draws plus enough of our missed flush draws to make our nuts to air ratio 3 to 1, which means we will bet 1/3rd of the time and our ev will be the pot. So our EV in this case is 200 * 1/3 - 50.

When a blank comes our EV is just -50 since we always have to check fold.

As it turns out

(9 * 150 + 6 * (200 * 1/3 - 50) + 29 * -50) / 44 = 0

Calling with a balanced range of draws is exactly EV 0, which means we will be indifferent between calling, folding and shoving. Note that this is specifically due to the stack sizes. If the effective stacks were $151, the EV of calling with this range would be greater than 0 and we would never want to fold a flush draw, and if stacks were $149 we'd never want to call with a flush draw.

Furthermore, if we look at the hand specific EVs, they are both 0. With a flush draw, we hit our draw more often, but we can't bluff it as effectively when it misses and the other draw completes. Of course as soon as stacks get any deeper our EV for calling with a flush draw will become positive.

Now that we know the EV of calling with a FD and a straight draw is 0 so long as y = x/3, we can note that shoving a flush draw is always going to be better than shoving a straight draw, due to the slightly higher equity when called.

We should shove our flush draws such that our opponents EV for calling against our range of nuts + flush draws is equal to the EV of folding, -50. Call f how often we shove a flush draw, which has equity e when called. In this case e = 0.205

EV[call shove] = -150 * 1 / (1 + f) + f / ( 1 + f ) * (-150 * e + 250 * (1-e))
EV[fold to shove] = -50

These are equal when f = 0.4587. (wolfram alpha)

Now in this special case where the stacks are exactly $150 what we do with the rest of our range doesn't effect our EV. However, note that we assumed shoving the nuts was our highest EV option. Were we to try something else (say calling) with the nuts, we'd want to have enough bluffs on the river to maxmize our EV with the nuts, so I will consider the case where we call with all of the flush draws that we don't shove, although as we will see the strategies that folds some or all of the flush draws that aren't shoved are all also GTO.

It is easy to now directly calculate the EV of calling with the nuts and it turns out that it is significantly lower than the EV of jamming, I won't show that calculation here so as to keep this post short(ish).

Now that we've worked out the GTO response to a villain lead of 50 chips, we need to actually determine the EV of leading for the villain. Because the hero EV when he calls and folds is 0, and the villains EV when the hero shoves is -50, the EV is just

-50 * ((1+ f) / 3) + 100 ((2 - f)/3) = 27.08.

Computing the EV of checking for the villain can be done in the exact same fashion, but as we can see from the GTORB solution it is much lower EV for the villain to check.

I encourage you guys to experiment with the mathematical techniques above and to work out the checking EV on your own, but as it turns out leading is optimal for the villain and thus his EV in this game is 27.08.

Friday, December 5, 2014

GTORB Flop Sneak Preview -- 3 Street Barreling Game

As most of you know I've been working on getting GTORangeBuilder ready to solve flop scenarios. I still have a ways to go before the flop solver is ready for public release but I thought I'd share a sneak peek of a very simple solution to a 3 street version of the bluffing game that we looked at in GTO Brainteaser #8.

As it turns out we'll need to look at a smaller bet size than we did in brainteaser #8, as with three half pot bets left against a 50% nuts 50% air range the villain actually cannot profitably ever call and the hero can take the entire pot so its not a very interesting example. Instead we'll look at a smaller bet stack size.

You are on the flop and the board is AsAhKs
The hero has hand range of AcAd and 3c2c
The villain has a hand range of KcKh
The pot is 100 chips and stacks are 154.8 chips (exactly enough to 3-barrel 30% pot)
The hero can either bet 30% of the pot or shove on every street

Thus as the hero, half of our range is the pure nuts and the other half is almost pure air (we split the pot on runner runner aces). A browseable GTORangeBuilder solution is at the bottom of the post so if the math doesn't interest you, just scroll down. The main takeaways will come as no surprise if you understand the two street game. Each additional street just makes it more and more difficult for the villain to get to showdown, driving his EV down.

Mathematical Solution

I'll go through the mathematical solution to this a bit quickly. In practice it is important to check for pure strategy solutions, rather than just assuming indifference but to keep this example simple I will skip that step.

Assume the villain plays a mixed strategy and calls some of the time on all streets except that

When the turn or river is an Ace he shoves because he knows the hero has air
When the turn or river is 3c or 2c he folds because he knows his opponent holds the nuts

We'll also assume that it is optimal for our hero to play a mixed strategy between betting and check/folding with his air on all streets, except that when an Ace comes he will check/fold because his hand is now face up.

Finally we will assume that the hero always bets the nuts.

If the hero is mixing between checking and betting air and never checks the nuts then his EV when he checks air is 0. So his EV when he bets air must also be zero, and his EV with air on all streets must be 0 minus however much he put into the pot on prior streets.

This means that for all streets the villain needs to call to make him indifference between betting and folding so if we call c the probability that the villain calls a 30% pot bet

(1-c) * 1 - .3 * c = 0

So c = 1/1.3 = .769.

That is the villains calling frequency on all streets against a 30% pot bet. Of course as we saw in the solution to brainteaser 8 the hero should never jam as he is then not able to compound the nuts over multiple street and maximize fold equity.

What about the hero? To make the villain indifferent between calling and folding to a bet, on a blank river after the flop and turn when bet/call he should bet a range that is .3/1.3 nuts to air according to the mathematics of poker.

On a blank turn what is the villains EV for calling assuming he plays his GTO on the river? We'll split it into two cases, the case where his opponent holds air and the case where his opponent holds the nuts. Everything is written in terms of turn pots.

He always loses the .3 pot turn bet he calls and then on blank rivers (which come 42/44) he loses another .3 of the 1.6 river pot when he calls a river bet (which he does 1/1.3).

EV[call vs nuts] = -.3 - (42/44 * .3 * 1.6 * 1 / 1.3)

Against air his opponents EV is always 0 so he wins the pot plus the .3 bet

EV[call vs air] = 1.3

Call x the percent of the hero's blank turn betting range that is the nuts. The villain is indifferent between calling and folding when

x * EV[call vs nuts] + (1-x) * EV[call vs air] = 0

Plugging this into wolfram alpha tells us that x should be .665.

Finally we can reproduce the exact same method of indifference calculation to determine the optimal flop betting frequency. Rewriting the call EV equations from above on the flop is easy, they are almost identical, except that when both the turn and the river come blank the villain will call a 3rd value bet 1/1.3. The EVs below are written in terms of flop pots.

EV[call vs nuts] = -.3 - (43/45 * 1/1.3 * (.3 * 1.6 + 42/44 * .3 * 2.56 * 1/1.3))
EV[call vs air] = 1.3

EV[call vs nuts] = -.5 - (43/45 * 1/1.5 * (.5 * 2 + 42/44 * .5 * 4 * 1/1.5))
EV[call vs air] = 1.5

Resolving x * EV[call vs nuts] + (1-x) * EV[call vs air] = 0 gives us

x = .549 according to wolfram alpha.

So how often should the hero bluff his air on the flop? Since he is always betting his nuts and his range should be .549 nuts, he should bet his air with frequency a such that 1/(1+a) = .549, or a = .821.

Then on the turn his starting range is .549 nuts and he wants to bet a range that is .665 nuts. Thus he should bet air with a frequency that solves .549 / (.549 + a * (1-.549)) = .665 or a = .6132.

Then on the river the hero's betting range should be 1.3/1.6 = .8125 nuts so the hero should bet with his air with a frquency that solves .665 / (.665 + a * (1-.665)) = .8125 or a = 0.458

Finally what is the EV for the game? Since the villain's EV for calling a flop bet is zero, he only gets EV when the hero checks the flop in which case he wins the whole pot since the hero always has air.

The hero checks the flop with air 1 - .821 = .179 and the hero holds air half the time so his EV will be .179 / 2 * 100 = 8.95 chips, while the hero's EV will be 91.05.

That's our mathematical solution, you can see how it matches GTORangeBuilders computational solution below.

GTORangeBuilder Computational Solution

Wednesday, November 26, 2014

Epsilon Equilibrium: How You Can Separate GTO Fact from Fiction

It turns out that any reasonably motivated player, armed with CREV can measure how close to GTO a strategy is, even in spots where actual GTO play is not known by employing a game theory concept called an Epsilon Equilibrium. Epsilon equilibrium let us measure exactly how near to GTO strategies are and is a standard game theoretic technique that is used to compare the quality of various strategies.

See my latest videos below to learn more about how to compute epsilon equilibrium and why they are important. This will also be a major topic in my upcoming Cardrunners video on Multistreet Theory and Practice.

Monday, November 17, 2014

Improving your turn play with GTO -- Check raising vs leading on the turn

As GTO strategy and computational GTO solutions have begun to take over mid/high stakes poker strategy more and more players are studying and analyzing GTO strategies in an attempt to improve their play. However, many players find it difficult to bridge the gap between viewing and analyzing a specific GTO strategy solution to actually understanding how to use GTO analysis to directly improve their game play and I've gotten a number of requests for a more practical post that directly show how to use computational GTO solutions to improve your play at the tables.

Today I'm going to give a simple example of how to use GTO poker and GTORangeBuilder to directly improve your play in a common real world situation that many players struggle with and show the general analytic process that I use in my strategy packs. For the actual analysis I decided a video demo would be the most instructive, see below. You can browse the solution that is discussed in the video here:

however before getting into the video I always wanted to take a second to answer three common questions that I often get from people who are new to computational GTO analysis.

Why should I study GTO, doesn't GTO play just break even against fishy players? Don't I need to focus on exploiting the fish I play against?

GTO play most definitely does not break even against fishy players, in general it crushes them. The idea that GTO is a purely defensive or "break even" strategy is a misconception that comes from people often learning about it in very simple "toy game" situations like rock paper scissors or the clairvoyance game. In real world poker situations GTO play extracts significant EV from both regs and fish, see here and here for more details. GTO theory also lets us target specific leaks using a concept called minimally exploitative play and GTORB lets you lock in specific opponent strategies that you wish to minimally exploit.

Isn't GTO play just about understanding 1-alpha bluffing and calling frequencies? Why do I need computational solutions?

Before true computational GTO solvers like GTORB emerged, many players tried to "estimate" GTO play using the 1-alpha value derived from the clairvoyance game in the mathematics of poker which assumes one player has a purely nuts or air range and the other player can only possibly hold bluff catchers. It turns out that once we had software that would let us actually calculate true GTO play in specific situations it become clear that these estimations were far from correct and missed many of the key strategic intricacies that exist in poker. In the flop c-bet defense strategy pack I showed that defending vs flop c-bets at a 1-alpha frequency is usually a major mistake and in my Cardrunner's series I show that even in very simple river situations some of the best 1-alpha based ranges like those from Matt Janda's books are many times less accurate than computational solutions and can misplay key hands.

GTO strategies are so complex that I could never hope to correctly play them at the tables, how can I actually learn anything that is useful to my everyday play from GTO solutions

As I show in the video below, the goal of studying GTO play is not to try and directly copy the exact frequencies in your own play at the tables, but it is instead to gain a deeper understanding of the fundamental elements of strong poker strategy in specific types of situations. In the video below I demonstrate how you can scientifically and precisely measure the EV importance of strategic options like donk-betting the turn, using different bet-sizings, etc to gain a deep understanding of the a complex real world poker situation.

Sunday, November 16, 2014

GTO is so much more than unexploitable

One of the most common misconceptions that people tend to have regarding GTO poker play comes from the idea that somehow the key element of a GTO strategy is its "unexploitability" or "balance" and the belief that any unexploitable strategy is inherently GTO.

The conditions required for a strategy to be GTO are much stronger than simple unexploitability (although of course any GTO strategy must be unexploitable), and in a practical sense, the elements of GTO play that are generally going to be the most valuable to try and use in real world poker games are the elements that have nothing to do with unexploitability. By focusing on unexploitability people minimize and miss what is actually the a huge part of the value of understanding GTO play.

Today I'm going to take a look at why people have come to often confuse the idea of unexploitability with GTO and go through the core definitions and a simple example that illustrates the key difference between a GTO strategy and one that is only unexploitable. This will also serve as a nice lead in to my next post which will present a practical example of how to analyze and improve your 6-max OOP turn play in raised pots by better understanding GTO.

Toy Games and GTO

GTO play often gets confused with unexploitable play due to the fact that in the very simplest toy games the two are equivalent. People learn the solution to the toy games, without fully understanding the definition of GTO and assume that they now know what GTO means.

Games like Rock Paper Scissors, or the Clairvoyance game from the mathematics of poker only have a single "reasonable" unexploitable strategy, which happens to also be fully GTO, which means that people who are new to game theory are prone to mistakenly assume that GTO and unexplotable are equivalent.

Furthermore, in these toy games, you can solve for that GTO solution using only indifference conditions (which only can be used to identify unexploitability) and thus the mechanism for finding the solution reinforces the idea that unexploitability is all there is to GTO. In fact, most arguments I've heard against GTO play stem almost entirely from generalizing from the Clairvoyance game to all of poker without any thought to the idea that a game that is trillions of times bigger might be fundamentally different.

This happens because when solving toy games we usually automatically discard strategies that might be unexploitable but intuitively are obviously dumb. However, this completely breaks down in extremely tough games because the "obviously dumb" decisions are no longer at all obvious, and in fact identifying and avoiding these "obviously dumb" leaks in large games is where a huge amount of the value of studying GTO play comes from.

Definitions

Lets go back to the core definition of a GTO strategy, which is a strategy for a player that is part of a nash equilibrium strategy set. In a 2 player game a strategy pair is a nash equilibrium if, "if no player can do better by unilaterally changing his or her strategy" (source wikipedia).

How does this actually tie into the concept of exploitability and in a technical sense, what does exploitability actually mean? The idea of exploitability is relatively intuitive, if your strategy is exploitable, it means that if your opponent know your strategy they would be able to use that information to alter their own strategy in a way that would increase their EV against you. Formalizing the above, gives us an accurate definition of exploitability, but we need to define one more concept first.

A "best response" (sometimes called a "maximally exploitative strategy", or "counter strategy") to a given opponent strategy is a strategy that maximizes our EV against that opponent strategy, assume that his strategy is completely fixed.

Exploitability can now be defined (and measured) as follows. Call G our GTO strategy, and S our opponents strategy. Call B the best response to S. S is exploitable if our EV when we play B against S is higher than our EV when we play G against S and that EV difference is the magnitude of the exploitability.

Intuitively, this should make perfect sense, our opponents strategy is only exploitable if we can alter our own strategy to exploit him and increase our EV, and the amount of EV we can gain when we maximally exploit him is an accurate measure of the magnitude of his exploitability.

Any GTO strategy must be unexploitable to satisfy the definition of a nash equilibrium, but in complex games there are usually infinitely many inferior unexploitable strategies that are not GTO.

A GTO strategy, is a strategy that is using every possible strategic option and every synergistic interaction between various hands in our range to maximize our EV while also still being unexploitable. In most real world cases, understanding which of our strategic options are strong and how to correctly leverage that strength against our opponent in ways they cannot prevent is what makes GTO play powerful.

An Example -- GTO Brainteaser #6

Armed with our definitions we can now look at a very simple example of a unexploitable, non-GTO strategy. Keep in mind that even this example is a relatively simple toy game and that the real game of poker generally has infinitely many unexploitable strategies that pass up EV and are not GTO for far more complex reasons.

We're going to revisit the model game from GTO Brainteaser #6. The setup is as follows:

There are 2 players on the turn with 150 chip effective stacks, and the pot has 100 chips. The Hero has a range that contains 50% nuts and 50% air and he is out of position. The Villain has a range that contains 100% medium strength hands that beat the Hero's air hands and lose to his nut hands.

For simplicity, assume that the river card will never improve either players hand.

The hero has 2 options, he can shove the turn or he can bet 50 chips. If the hero bets 50 chips on the turn he can then follow up on the river with a 100 chip shove.

As I show here it turns out that GTO play for the hero is to bet 50 chips on the turn and then 100 chips on the river with precisely constructed ranges that contain the right relative frequency of nuts and air, and to check/fold with the rest of his air. Following this strategy gives the villain an EV of 11.11 chips.

GTO play for the villain is to call a 50 chip turn bet or a 100 chip river bet 2/3rds of the time and to call a turn shove 40% of the time.

Now consider the non-optimal strategy S where the hero shoves the turn 100% of the time with the nuts and 60% of the time with his air, and check/fold sthe rest of his air. It is easy to check that the EV of this strategy is 20 chips for the villain. So we are giving our opponent almost double the EV by playing this weaker strategy S where we jam the turn.

Clearly S is not GTO, we could unilaterally increase our EV by switching to the GTO strategy, because S is fundamentally not wielding our range and our stack to optimally prevent our opponent from realizing his equity.

However, S is completely unexploitable. Because our turn shoving range is "balanced" to be 3/8ths bluffs our opponent is exactly indifferent between calling and folding to a shove so he cannot increase his EV by switching from his GTO strategy to a maximally exploitative strategy.

Someone looking for an "unexploitable" strategy might be happy with the strategy S, but in this case, S misses the entire practically valuable lesson that we can learn from the model, which is that by betting half pot twice we actually utilize our polarized range much more effectively than we do by shoving it, to the extent that we cut our opponents EV approximately in half. The entire point of the example and its power is completely lost if we focus on unexploitability rather than on EV maximization and the true definition of GTO.

In fact our EV if we play an exploitable strategy where we bet 50 chips and then jam the river for 100, but with slightly incorrect value bet to bluff ratios is much higher than it is when we play the unexploitable strategy S, even if our opponent perfectly exploits us.

Similarly, when analyzing complex real world situations, focusing on unexploitable play is generally going to completely miss out on valuable lessons that a thorough study of GTO play has to offer.

Special thanks

This post was actually largely inspired by an email from a GTORangeBuilder user who was confused that using CardRunners EV's "unexploitable shove" option didn't give him a GTO strategy like GTORangeBuilder does. Of course the feature does exactly what it says, it gives a range that is the best range for us to shove if our opponent perfectly counters our shove, which in no way suggests that the shoving range is actually GTO.

Wednesday, November 12, 2014

GTO Brainteaser #9 -- Multistreet Theory: Range Building with Draws

This week we are going to examine a multi-street game, starting from the turn where one players range consists of the nuts and draws while his opponents range consists of a medium strength hand that never improves to the nuts. I'll be going over this game in depth and providing a solution in part of my next CardRunners video which should come out in a few weeks.

Not that this is almost identical to the game that we examined in GTO Brainteaser #6 except that in this case, the hero's turn bluffs have the potential to improve on the river. In both games the player with the merged range has about 50% equity (in this game he has 53.8%).

The setup is as follows:

There is a $100 pot, and we are on the turn with $150 effective stacks.
The board is AsTs9c2d
The hero is in position and his range is 1/3 nuts, 1/3 straight draws and 1/3rd flush draws, specifically, the hero holds either 7s3s, 8h7h, AcAd.
The villains always holds a medium strength hand, KcKd
Both players are allowed to either check, bet 50% pot, or shove on both the turn and the river.
Your opponent plays GTO

The villain offers to always check the turn to you if you pay him $3 should you accept his offer? What does GTO play look like in this game for both players? Is this game higher EV or lower EV for the villain than the nuts/air game from GTO Brainteaser #6?

Would the hero be better or worse off if his range was 1/3 nuts 2/3 flush draws? How would optimal play change?

Thursday, November 6, 2014

GTORB Turn Launched -- Multistreet theory video coming soon

The latest version of GTORB can now solve turn scenarios and is available for purchase! For a sneak peak of what a solution looks like see this post.

I've been putting most of my time towards the turn solving code lately so I haven't had time to do as much blogging / video creation, but now that the turn solver is released I'll be releasing a series of blog posts as well as a CardRunner's video on multistreet GTO theory with example turn solutions over the coming month, stay tuned!

Monday, October 20, 2014

GTO Brainteaser #8 Bonus Solution -- Optimal Betsize Calculations

I got a request for a numeric answer to the bonus of GTO Brainteaser # 8 which involves solving for an optimal bet size in a two street bluffing game. The solution to the non-bonus question is here and is worth reading first. I haven't actually done a post on deriving optimal betsizing in multistreet play before so I thought it would be useful to demonstrate the mathematics involved. I'm not going to restate the game structure again here so please check out the original problem statement if you are not familiar with the original game.

The basic technique for calculating optimal bet sizing is as follows.

Rather than using a fixed betsize in your calculations, make the bet size a variable and solve for GTO strategies as a function of that variable
Compute the EV of the game when both players play the GTO strategies as a function of the bet size variable
Maximize the EV of the person making the bet with respect to the bet size variable. That is your optimal bet size.

While this technique is quite simple conceptually, the actual algebra involved can be hairy so I usually just make wolfram alpha do it. So lets get started.

To calculate the optimal betsize we will make a few assumptions that are reasonably easy to verify and that I have shown in other posts / videos.

The hero should always bet the nuts on the turn. This allows him to "compound the nuts" over multiple streets which as I showed here and in more depth in my CardRunners videos is always +EV compared to betting on a single street with a polarized range.
On the river it will always be most profitable to shove with the nuts with our polarized range. This is quite simple to prove and I showed it in my first CardRunner's video.
The hero's EV with his air on the river will be 0 unless z is so large that it is optimal for the villain to always fold the river. However, clearly if the villain were always folding in a spot where the hero might hold air on the river, he would never call a turn bet, so any time a turn bet is called, the hero's EV with air must be 0 on the river

Combining observations 1 and 2 we can parameterize our bet sizing strategy with a single number x, the number of chips that we plan to bet on the turn.

If we bet x chips on the turn, then we know we will jam the river and bet the rest of our chips. Given a starting turn pot of 100 chips if we bet x chips on the turn, the river pot will be 100 + 2x when we are called.

Our river jam will thus be a bet of (150 - x) into a pot of 100 + 2x. This means that we will be making a

b = (150-x) / (100 + 2x) percentage pot bet on the river.

Now lets call the frequency of the villain calling the turn c. Observation 4 tell us that since the hero's river EV is 0 with air, his EV for bluffing the turn is very simple to calculate.

EV[turn bluff] = (1-c) * 100 - c * x

Clearly the villain always folding or always calling the turn is highly exploitable so we know his turn calling strategy is mixed and we can apply indifference conditions to see that

c = 100 / (100 + x)

What about the villains calling frequency on non 3d/2c rivers? This will of course just be determined by indifference conditions that depend on the pot / bluff size. Call the villains river calling frequency rc.

EV[river bluff] = (1 - rc) * (100 + 2x) - (150 - x) * rc

Indifference conditions imply that

rc = (2x + 100) / (x + 250)

Now we can write the optimal turn bluffing frequency as a function of the turn bet size as well by looking at the villains EV for calling. I calculated the villains EV of calling when the bet size was 50 chips in my previous post but I will duplicate the calculation here, assuming that the hero is bluffing with frequency with his air and always betting the nuts. This means that (1-z) = a / (1 + a) of his betting range is air and z = 1 / (1 + a) of his betting range is the nuts.

Since the hero's EV with air on all rivers is 0, when he bluffs and we call we win the 100 chip pot plus his turn bet size in EV. When he holds the nuts on 3c/2d runouts our EV is 0 on the river and on other runouts our EV when our opponent holds the nuts .

EV[call] = (100 + x) * (1-z) + z * (2/44 * -x + 42 / 44 * (-x - rc * (150 -x)))

If we apply indifference conditions to say that the EV of a call must be 0, this a relationship between z and x that we can solve for z.

Wolfram Alpha is much better at algebra than me so I just computed that relationship here.

Now the EV of the game for the villain is just how often the hero checks the turn, which is just (1-a) / 2, because by indifference conditions, when the hero bets the villain is indifferent between calling and folding and thus his EV is 0.

Since z = 1 / (1 + a), a = (1/z) - 1, so (1 - a)/2 = (2 - 1/z) / 2

So the EV of the game is (2 - 1/z) / 2 for the villain and we know z as a function of x. Thus the optimal bet size for the hero is the value of x that minimizes his opponents EV, (2 - 1/z) / 2, where z is between 1/2 and 1 (because our betting range is at least 1/2 nuts and at most 100% nuts. Since this is clearly decreasing in z, we just need to minimize z.

Again I calculated this using wolfram alpha here. The result is that the optimal bet size is 52.69 chips. This intuitively makes sense, as we would expect to bet slightly larger on the turn with some of the river runouts killing our action than we would without that risk.

The EV of this game is ~87.89 so the EV gain by changing betsize in this case is tiny, about 0.02 chips.

Thursday, October 16, 2014

GTO Brainteaser #8 Solution -- Multistreet Theory vs Practice

In this post I'm going to discuss the solution to GTO Brainteaser 8, check it out here if you missed it. I'm also going to provide some introduction to multistreet theory and some simple examples of understanding the impact of runouts and the flow of information across streets. A browseable GTORB version of the turn solution is also presented near the bottom for those of you interested in a sneak peak at the GTORB turn solution interface (it still needs some polish).

Solution

The brainteaser involved studying the following game:

You are on the turn and the board is AsAhKsKh

The hero a hand range of AcAd and 3c2d

The villain a hand range of KcKd

The pot is 100 chips and stacks are 150 chips

The hero can either bet 50 chips on the turn and 100 on the river or he can shove for 150 on the turn.

The goal was to determine how and why this game was different from the nuts vs air multi-street game that we looked at in GTO Brainteaser #6 and from the multi-street polarized vs merged range theory in the mathematics of poker.

The basic answer to this question is quite simple, the real world game with an actual deck is worse for the hero than the model game from GTO Brainteaser 6, because when a 3c or 2d hits on the river it reveals to the villain that the hero must hold the nuts. The villain is able to convert this information into money by folding 100% on either of these rivers. Perfectly polarized ranges are always the strongest possible ranges, so for the hero, having his range depolarized on the some river runouts decreases his EV.

Note that the villain also gains information when an Ac or Ad comes on the river, but this information is not valuable because our EV when we hold 3c2d is 0 anyways. A GTO opponent calls enough to make us indifferent between bluffing and folding, so if we are forced to always fold that doesn't actually decrease our EV. In this case the villain still gains information but has no way to convert it into money.

We can calculate the exact EV decrease quite simply. 1/2 of the time we hold AcAd and if we recall the solution to GTO Brainteaser #6, our EV with AcAd in this spot without river runouts giving away information is 16/9th of the pot. Now when we hold AcAd, since we know our opponent holds KcKd, there are 44 river cards that might come and 2 of them reduce our EV to 1.5 pots in the case where our opponent calls our bet.

Clearly, the villain must still make us indifferent between betting and checking a Q, on the turn which means that he must call 2/3rds of the time to make the EV of betting 0. Thus the hero EV with an Ace in the new game can by calculated by adding up:

1/3 * 100 -- we bet and they fold
2/3 * (42/44 * R) -- hero bets, villain calls and a non-3c/2d river comes where R is the hero EV on that river
2/3 * (2/44 * 150) -- hero bets, villain calls, and a 3c or 2d river comes and villain just folds

On the unblocked rivers, as in Brainteaser #6, the villain must call our bet of 100 chips 2/3 of the time so our ev with an A on the river is R = 150 * 1/3 + 250 * 2/3 = 650 /3.

Thus our EV with an A is 175.76 chips or 1.7576 pots. Our EV loss with an Aces is 16/9 * 100 - 175.76 = 2.02 chips.

Since we hold the nuts half of the time, our overall EV loss is 1.01 chips. So the EV of the actual game for the hero is about 87.87. As you can see in the solution browser below, GTORB computes the EV as 87.83 which is within the given margin of error of 0.05 chips.

How does this EV loss effect optimal play? Intuitively this is actually pretty simple. Of course we still always should bet the nuts, and we should bet enough of our air that our opponent is indifferent between calling and folding to our turn bet. In this game, where our opponent's EV when we hold the nuts is higher, we need a higher nuts to air ratio to maintain his indifference which means that we must bluff the turn less frequently.

Mathematically figuring out the optimal bluffing frequency is a bit complex as we need to make sure to properly weight the probability of various river cards coming, using all of the villains information about his opponents range and his own hand.

The villains EV for calling the turn and then playing GTO on the river is 150 when his opponent holds air (On average he wins the entire river pot of 200, but 50 of those chips were his own). When his opponent holds the nuts, on 3c or 2d runouts his EV is -50 because he called 50 and the turn and always folds the river. On other all other runouts he calls the 50 on the turn plus an additional 100 on the river 2/3rds of the time for a total EV of -350/3 (-116.66).

If the hero is betting x% of his air and all of his nuts then when he bets he holds air x/(1+x) of the time and the nuts 1/(1+x) of the time and given that he holds the nuts, 3c or 2d come 2/44ths of the time.

And the villains EV for calling a turn bet of 50 is

150 * x / (1 + x) - 1/(1+x) * (100 * 2/44 + 350/3 * (1-2/44)).

Setting that equal to 0 and solving for x gives that the hero should bet 25/33 or 75.76% of his air on the turn and 43.1% of his turn betting range should be air which matches the GTORB solution precisely. The villain still calls 2/3rds of both the turn and the river bet as that is all that is required to make the hero indifferent between bluffing his air and checking it. This is our exact mathematical equilibrium solution, you can browse the approximate GTORB solution below.

I also solved the optimal bet sizing bonus question in a separate post here for those who are interested.

Takeaways

This example may seem trivial, but as it turns out, the existence of river runouts shift the range distributions of players and transfer information. Being in a position to put as much money into the pot as possible when you have an informational edge over your opponent or when your equity distribution is polarized and as little as possible when it is merged is very powerful.

I'll demo a much more powerful example of this in the next brainteaser where we will see an example of how equity transitions and river information can make protecting your hand via turn bets that never fold out better hands and are never called by worse hands, still be GTO, even if you were required to pay your opponent his turn hand equity when he folds worse.

Note on epsilon equilibrium: One quick note on the GTORB solution which is a 0.05 chip (5/10,000ths of the pot) epsilon equilibrium. Due to the approximation techniques used, the GTORB strategy actually has the hero checking the nuts on the turn a tiny fraction of a % of the time. This has almost no impact on the game EV or solution accuracy but it does mean that if you examine a river after both players check you will see the hero bet with a very low frequency. This is because the approximate solution has him holding the nuts with a tiny probability. These rounding errors are why the solution has a nash distance of 0.05 chips which in this case means an opponent who played perfectly could exploit the approximate GTO strategy for 0.05 chips out of the 100 chip pot.

Thursday, October 9, 2014

GTO Brainteaser #8 -- Solving the Turn, Theory vs Practice

The internal alpha version GTORB is now capable of solving turn scenarios for GTO turn and river play so this brainteaser is going to focus on multi-street theory. In the solution (probably a week from today) I'll post the first fully browse-able GTORB turn solution to the model game below for those of you who are excited play around with a GTO turn strategy. Note that the version of GTORB that can solve the turn won't be released commercially for a month or two as there are some performance / scalability issues that I need to solve before it is ready for mass use. It will likely cost extra.

The problem

In GTO Brainteaser #6 I looked at a model scenario where the hero had a range of 50% nuts, 50% air while the villain had a range of 100% medium strength hands. There was a 100 chip pot, 150 chip stacks and two streets of betting. The hero could either bet 50 chips on the turn and then have the option to bet 100 chips on the river or he could shove the turn for 150 chips, and the question was which option is higher EV and what are GTO strategies for both players in this game.

The key simplification that made this scenario quite different from real world poker is that it was assumed that no river card was actually dealt, there were just two rounds of betting.

For those who are curious you can check out the full solution to brainteaser #6 here. It turns out that it is optimal for the hero to bet 50 chips on the turn with all of his nut hands and 7/9ths of his air and then to bet 100 chips on the river when he is called with all of his nut hands and 3/7ths of his air hands. The villain calls each of these bets 2/3rds of the time and folds 1/3rd. The hero wins 8/9ths of the 100 chip pot in EV in this game. Furthermore, it turns out that betting 50 chips on the turn and 100 chips on the river is the exact optimal bet sizing for the hero to maximize his EV, all other bet sizes are lower EV.

Lets now look at a very similar game. Imagine the following (completely made up) scenario.

You are on the turn and the board is AsAhKsKh
The hero has a hand range of AcAd and 3c2d
The villain has a hand range of KcKd
The pot is 100 chips and stacks are 150 chips
The hero can either bet 50 chips on the turn and 100 on the river or he can shove for 150 on the turn.

Clearly no matter what river card comes, the relative strengths of the hands in both players ranges will not change so in that respect this game seems identical to the model game from brainteaser #6. AcAd will beat KcKd on every possible river and 3c2d will lose to KcKd on every possible river.

However, it turns out that GTO play in this game is different from GTO play in GTO Brainteaser #6. Why?

What is the EV of the game for the hero, is it higher or lower?
What are the optimal strategies in this game and what is the hero's EV when both players play optimally?

Bonus: Is betting half pot on the turn and the river still optimal or is there a higher EV bet size?

Tuesday, September 2, 2014

GTO Poker and Multiple Equilibria Part 3

In part 1and part 2 of this post I described some of the properties of multiple equilibria in zero sum games and explained how different equilibrium solutions can perform differently against various types of sub-optimal players. The basic idea is that by playing exploitatively against lines that optimal players don't actually use we can remain completely unexploitable while still targeting and attacking leaks in our opponents play. This is accomplished by just shifting which GTO strategy we are playing at any given time, based on our opponents tendencies.

Today I'm going to conclude that discussion by going through an example of a simplified poker scenario with two equilibria, each of which performs quite differently against various types of fish. I described this example in detail at the end of part 2 so I'll just very briefly reiterate the situation here.

The board is: 2sTs9c5h3s
The IP range is: 22, 87, T9, QJ, Ks8s+, As2s-AsJs, 7s6s, 9s8s, Qs9s
The OOP range is: QQ+
There are 100 chips in the pot and 150 left to bet

As we saw last time, it is never GTO for the OOP player to lead for 50% pot here with any of his range. However, we're going to consider the performance of two GTO strategies against two types of fish, both of whom are going to randomly lead for half pot with their entire range 10% of the time. Fish 1 is thinking that "when he shoves here he's never bluffing" and is feeling you out with his bet and plans to fold to a shove 100%. Fish 2 is thinking "OMG I haz overpair" and is planning to call a shove 100% of the time.

Our goal was to find two strategies that are GTO (this requires that there is no profitable deviation that would allow a GTO opponent to increase his EV by leading for 50% pot with some hand is his range) but that also extract as much extra value as possible from each type of fish who decides to lead for 50% pot.

This means that we need to ensure that however we react to a 50% pot lead, the EV for our opponent against that reaction is lower EV than the EV of him playing the GTO strategy for every hand in his range, In this situation it was optimal to always check. I've included the checking EVs below.

Hand % of Range Check EV

5.56

14.02

5.56

14.02

5.56

16.69

5.56

14.02

5.56

16.69

5.56

16.69

5.56

25.87

5.56

25.87

5.56

30.57

5.56

25.87

5.56

30.57

5.56

30.57

5.56

25.87

5.56

25.87

5.56

41.15

5.56

25.87

5.56

41.15

5.56

41.15

Now if we were just being maximally exploitative against these fish, we would always shove against the fish who leads with the intention of folding and we would only shove our 2-pair + against the fish who shoves with the intent of calling (otherwise we'd fold), however, doing so might open up the opportunity for an exploitative opponent to attack us.

For example, if we were to always jam over a lead, an exploitative opponent could bet call QQ and win 45.9% of the time for a massively profitable deviation. The EV of a bet call would be .459 * 250 - .541 * 150 = 35.1 chips vs the 15.36 EV of checking QQ when both players play GTO. In general, we won't be able to take maximally exploitative lines, instead we'll need to find strategies that are moderately exploitative while maintaining enough balance to stay GTO.

Doing this at least approximately is actually relatively straight forwards. With these ranges, the hand our opponent would most like to deviate with is QQ without the Q of spades as it is his lowest EV hand for checking and all his hands have similar equity vs our range.

We want to shove as wide a range as possible while keeping our opponents EV for bet calling QQ below 14.02. All this is quite simple to do in CREV, but first as a reminder here is the GTORB equilibrium solution which we will use as a starting point.

Some quick tinkering in CREV will show that you can take the GTORB equilibrium strategy which folds about 34.3% of the time to a lead and make it more aggressive by shoving all 87s and only folding 32% of our 87o. This shifted strategy is still GTO (which can be verified by using CREVs max-exploit button against the shifted strategy and verifying that the BB EV is still 25.87 but it will Jam and pick up $150 chips instead of folding an additional 6.1% of the time for an EV gain of 10 chips against the fish who bet folds. That's an additional 10% of the pot in the cases where our opponent donks! We'll call this strat GTOShove.

Similarly, against the fish who is going to bet call, we can shift the GTORB strategy in the opposite direction by folding more of our range while staying GTO. It turns out we can fold all our hands that our opponent beats except for 4 combos of 87o while staying GTO because our range is polarized while our opponents range is condensed so betting is just a weak play. We'll call this strat GTOFold. Note that in this case, GTOFold is very close to maximally exploitative in terms of how it responds to a river lead (the 4 bluff raise combos are the only difference in strategy)! You can see GTOFold vs the B/C fish here: http://gtorangebuilder.com/#share_scenarioHash=baf4b0e262af2f74264162cd34a82b5a/root_v=38.1. Note that I am rounding to a whole number of bluff combos, rather than considering strategies with eg 3.7 bluff combos, there may be a very slightly better GTOFold with fractional bluff combos.

I've put together a chart of the overall strategy vs strategy EVs. The % exploit is the percentage of the maximal exploitative leak that our strategy extracts from our opponent. Specifically it is:

(EV[our strat vs opponent strat] - EV[gto vs gto]) / (EV[max exploit vs opp strat] - EV[gto vs gto])

Of course all of our strategies are unexploitable so I didn't include our exploitability in the chart. When calculating the maximally exploitative strategies, I only considered exploiting our opponents in response to their river lead, I did not consider altering our strategy at all when responding to a check.

Our Strategy Opponent Strategy Our EV % Exploit
GTORB GTORB 74.1 N/A
GTOFold GTORB 74.1 N/A
GTOShove GTORB 74.1 N/A

GTORB B/C Fish 75.4 27.3%
GTOFold B/C Fish 77.7 80%

GTORB B/F Fish 77.4 43.1%
GTOShove B/F Fish 78.0 51.3%

This is one of the simplest examples of how GTO theory can be combined with exploitative play to create strategies that are impossible for our opponents to counter, but that still allow us to adapt or strategy to specifically target weaknesses that we identify in our opponents. People often consider GTO play as a purely passive style where you never adjust to how your opponents play in any way but in reality there are a variety of ways to adapt and attack your opponents while remaining completely unexploitable (as shown here), or while using GTO concepts to find the absolute least exploitability strategy that can achieve a specific win-rate against a specific opponent or while making it so that we are only exploitable in ways that we don't think our opponent will capitalize on. I'll be discussing all of these in more depth over the coming months.