Today I'm going to present some data that I gathered that compares the EV of different flatting frequencies from the big blind against a small blind preflolp raise (after the rest of the table has folded) in 6-max situations. This analysis uses some of the ranges from my Blind vs Blind Strategy pack and I will be adding a few additional solutions to the library using the alternate ranges from this analysis. In general the postflop strategy changes due to the alternate range are minimal on most boards. For example, as you can see in this solution on QT2 we shift from a 45% c-bet to 44.8% and a 52% call c-bet to 51.8% (on the A75 board there is a 2% c-bet shift and the largest shift was 6% on TT6) so the main point of interest is the preflop EV comparison, not the post flop strategy shifts.
While the exact data and ranges are specific to 6-max the technique of using postflop EV calcs to compare the quality of two different preflop ranges can be used in any game type so I think learning and understanding the methodology will be valuable to players of all game types.
For all of these calculations I used the Simple Postflop desktop version, which GTORB readers can get for $70 off using this link: http://simplepostflop.com/en/?source=gtorangebuilder. At the moment the workflow for doing this analysis in Simple Postflop requires a lot of tedious manual work but I am working with the creators of the program to get them to add tools to automated most of the process so that these types of comparisons can be done easily and quickly by anyone.
Because of the manual work required with the current interface I only used a smaller sample size in my analysis which means that the results are not conclusively statistically significant. Once they have improved the interface enough to make the process easy I will make a free youtube video on my youtube channel showing how to quickly set up these types of calculations and I will increase the sample size on this calculation to improve the statistical significance so stay tuned.
Range 2 (R2 - 58.26%): [25]43o, 64o, 76s, 85o, 87s, 98s, J6o, JTs, K2-K3o, KTs, Q3-Q4o, QTs, T6o, T9s[/25], [50]43s, 54, 65s, 75o, 88, A9s, AJo, KQo[/50], [75]62s, 64s, 72-73s, 75s, 77, 82-83s, 86s, 92-93s, 97s, A2-A5s, ATo, J9s, K9s, KJo, Q9s, T2-T3s, T8s[/75], 22-66, 32s, 42s, 52-53s, 63s, 65o, 74s, 76o, 84-85s, 86o+, 94-95s, 96, 97o+, A2-A5o, A6-A8, A9o, J2-J6s, J7-J8, J9o+, K2-K3s, K4-K8, K9-KTo, Q2-Q4s, Q5-Q8, Q9o+, T4-T6s, T7, T8o+
How can we mathematically compare which range is higher EV against a fixed "standard" opponent opening range? In this case I am going to be assuming an SB opening range of 52% and a 3x open.
Note that in both cases we are 3-betting 11.4% so with R1 we are folding (1 - .5222 - .114) = 36.38% vs (1 - .5826 - .114) = .3034% with R2
Note that these ranges give a total defense rate of about 65% vs 70%.
To start with let talk about the central limit theorem. There are actually many central limit theorems but in general they all state that if you have a reasonably "well behaved" random sample that is "large enough" then the average of that sample will be normally distributed, centered around the true expected value.
The key thing to determine is what "well behaved" means and what "large enough means". In this case the conditions for being "well behaved" are easily satisfied as there are a fixed number of discrete flops that can be dealt. What a "large enough" sample means is more complex.
Lets start by considering a case where an incredibly large sample is required to be considered large enough that the average would be normally distributed.
One of the simplest distributions there is is called the binomial distribution and it can represent any probability distribution with only two outcomes. One of the most basic limit theorems is that the binomial distribution can be well approximated by the normal distribution for a large enough sample. However, how large a sample is required depends on the probability of each of the two outcomes and their relative payoffs.
For example, imagine two games. In game one you flip a coin and if its heads you get a dollar if its tails you lose a dollar and that you play 100 rounds. The EV of the game is 0 per round.
In game two there is a 1 in a million chance of winning 10 million dollars and the rest of the time you get nothing and you play 100 times. The EV of this game is 10 per round.
In game one the odds that over 100 rounds your average per round payoff would be more than say $0.30 away from 0 are very low and the expected error is symmetric (you are as likely to overestimate the EV of the game by X as to underestimate it by X).
In game two most of the time your sample will contain no winning draws and you will estimate the average EV of the game as 0. In the case that you happen to get a winning draw you will estimate the average EV of the game as at least 100,000. The odds that your sample average are within $0.30 of the true mean are actually 0! Its impossible to get an accurate sample out of 100 trials. If your sample is large enough the result will still be normal but you would need to average over millions of trials.
As it turns out there are two key factors here.
While the exact data and ranges are specific to 6-max the technique of using postflop EV calcs to compare the quality of two different preflop ranges can be used in any game type so I think learning and understanding the methodology will be valuable to players of all game types.
For all of these calculations I used the Simple Postflop desktop version, which GTORB readers can get for $70 off using this link: http://simplepostflop.com/en/?source=gtorangebuilder. At the moment the workflow for doing this analysis in Simple Postflop requires a lot of tedious manual work but I am working with the creators of the program to get them to add tools to automated most of the process so that these types of comparisons can be done easily and quickly by anyone.
Because of the manual work required with the current interface I only used a smaller sample size in my analysis which means that the results are not conclusively statistically significant. Once they have improved the interface enough to make the process easy I will make a free youtube video on my youtube channel showing how to quickly set up these types of calculations and I will increase the sample size on this calculation to improve the statistical significance so stay tuned.
The Problem
The problem I am interested in examining here is trying to estimate what frequency we can successfully defend against a static small blind opening range if both players achieve the GTO EVs postflop and/or if we are able to gain some additional exploitative EV postflop. In particular I am going to consider a fixed 3-betting range of about 12% and look at expanding our flatting range by comparing the EVs of two flatting ranges. The first range is a range from my BvB strategy pack that a student and I estimated that we believe is reasonably representative of "standard" play in mid-stakes cash games on Stars. The second range is the same as the first but with about 6% more hands so in this case Range 1 is a subset of Range 2.
Range 1 (R1 - 52.22%): [25]64o, 76s, 87s, 96o, 98s, JTs, K4o, KTs, Q6o, QTs, T9s[/25], [50]43s, 54, 65s, 75o, 88, A9s, AJo, K5o, KQo[/50], [75]62s, 64s, 72-73s, 75s, 77, 82-83s, 86s, 92-94s, 97s, A2-A4, A5s, ATo, J9s, K9s, KJo, Q9s, T2-T4s, T8s[/75], 22-66, 32s, 42s, 52-53s, 63s, 65o, 74s, 76o, 84-85s, 86o+, 95-96s, 97o+, A5o, A6-A8, A9o, J2-J6s, J7-J8, J9o+, K2-K5s, K6-K8, K9-KTo, Q2-Q6s, Q7-Q8, Q9o+, T5-T6s, T7, T8o+
Range 2 (R2 - 58.26%): [25]43o, 64o, 76s, 85o, 87s, 98s, J6o, JTs, K2-K3o, KTs, Q3-Q4o, QTs, T6o, T9s[/25], [50]43s, 54, 65s, 75o, 88, A9s, AJo, KQo[/50], [75]62s, 64s, 72-73s, 75s, 77, 82-83s, 86s, 92-93s, 97s, A2-A5s, ATo, J9s, K9s, KJo, Q9s, T2-T3s, T8s[/75], 22-66, 32s, 42s, 52-53s, 63s, 65o, 74s, 76o, 84-85s, 86o+, 94-95s, 96, 97o+, A2-A5o, A6-A8, A9o, J2-J6s, J7-J8, J9o+, K2-K3s, K4-K8, K9-KTo, Q2-Q4s, Q5-Q8, Q9o+, T4-T6s, T7, T8o+
How can we mathematically compare which range is higher EV against a fixed "standard" opponent opening range? In this case I am going to be assuming an SB opening range of 52% and a 3x open.
Note that in both cases we are 3-betting 11.4% so with R1 we are folding (1 - .5222 - .114) = 36.38% vs (1 - .5826 - .114) = .3034% with R2
Note that these ranges give a total defense rate of about 65% vs 70%.
EV Formula
The EV formula for the comparison that we want to make is quite simple. Since we are 3-betting the same range regardless of our flatting range in this example, all we need to consider is the changes to our EV that come from folding less often with R2 and the changes that come from a reduction in postflop EV do to R2 having generally weaker hands. To normalize the EV change into something that we can use for overall bb/100 calculations we need to multiply it by (1-.114) = .886 as in those cases we will be 3-betting.
Mathematically we want to look at
.886 * ([how often we call] * [average payoff of calling] + [how often we fold] * [payoff of folding])
for each range and compare the values. If we call P[R] the postflop EV of R (the number of chips from the pot that we win on average after flatting with range R) then this can be written as
.886 * (.5222 * (P[R1] - 3) + .3638 * -1) vs ..886 * (5826 * (P[R2] - 3) + .3034 * -1)
So far this is pretty simple, assuming that we actually know P[R1] and P[R2] so we just need to figure this out. This is where the GTO postflop calculations come in.
Computing Average Postflop EVs
In general there are two approaches to computing the average postflop EV of a range vs a fixed opponent range.
Method 1 is to compute the postflop EV of the range on each of the 1755 possible strategically different flops and compute the weighted average of those 1755 EVs based on the frequency of each type of flop conditional on the blocker effects that the ranges impose on the board.
Method 2 is to randomly select a reasonable sample size of flops by simulating randomly dealing cards and to then do a statistical estimation of the true postflop EVs using standard statistical techniques.
In general Method 1 is superior for things like 3-bet pots where it is relatively feasible to compute 1755 postflop solutions, while Method 2 is more practical for situations with wider ranges and higher SPRs were computing all 1755 possible outcomes might take weeks.
Because in this case we are looking at a single raised BvB scenario with wide ranges I chose Method 2 and because the ranges are so wide I did not incorporate the blocker effects of the ranges on the random sampling of the board. With very wide ranges these effects are minimal but with something like a 4-bet pot range they are crucial (because for example an A is much less likely to be dealt than a 2 when both players have extremely strong ranges).
I chose to use a sample size of 100 per range which means that I had Simple Postflop run a total of 200 flop solutions to "low" accuracy which took about a day (running 1755 x 2 = 3510 would have taken a few weeks). The low accuracy usually gets to between 0.75% and 0.5% of the pot in exploitability. Since we are averaging the EVs of the 100 solutions together "low" accuracy is appropriate as the nash distance errors are likely to average out over a large sample. Note that I used the same 100 random boards for both ranges.
The only major challenge that comes with Method 2 is that we need to perform an accurate statistical analysis which is something that, particularly in poker, is often done incorrectly and should always be done with a great deal of care.
The next section will be purely about statistics and will likely be a bit dry so if you aren't interested in the math feel free to just scroll down to the results below.
Statistical Analysis of Flop EVs
Once we have all of our EV data we have to come up with a measure of the statistical noise that will be expected due to our random sampling of flops. Suppose that with R1 we get an average EV of X and with R2 we get an average EV of Y. How much bigger must Y be than X before we can say with a high probability that R2 is actually a stronger range (rather than it being the case that the 100 randomly chosen boards were just better flops for R2 than for R1)?
To start with let talk about the central limit theorem. There are actually many central limit theorems but in general they all state that if you have a reasonably "well behaved" random sample that is "large enough" then the average of that sample will be normally distributed, centered around the true expected value.
The key thing to determine is what "well behaved" means and what "large enough means". In this case the conditions for being "well behaved" are easily satisfied as there are a fixed number of discrete flops that can be dealt. What a "large enough" sample means is more complex.
Lets start by considering a case where an incredibly large sample is required to be considered large enough that the average would be normally distributed.
One of the simplest distributions there is is called the binomial distribution and it can represent any probability distribution with only two outcomes. One of the most basic limit theorems is that the binomial distribution can be well approximated by the normal distribution for a large enough sample. However, how large a sample is required depends on the probability of each of the two outcomes and their relative payoffs.
For example, imagine two games. In game one you flip a coin and if its heads you get a dollar if its tails you lose a dollar and that you play 100 rounds. The EV of the game is 0 per round.
In game two there is a 1 in a million chance of winning 10 million dollars and the rest of the time you get nothing and you play 100 times. The EV of this game is 10 per round.
In game one the odds that over 100 rounds your average per round payoff would be more than say $0.30 away from 0 are very low and the expected error is symmetric (you are as likely to overestimate the EV of the game by X as to underestimate it by X).
In game two most of the time your sample will contain no winning draws and you will estimate the average EV of the game as 0. In the case that you happen to get a winning draw you will estimate the average EV of the game as at least 100,000. The odds that your sample average are within $0.30 of the true mean are actually 0! Its impossible to get an accurate sample out of 100 trials. If your sample is large enough the result will still be normal but you would need to average over millions of trials.
As it turns out there are two key factors here.
- High variation in payoffs increases the required sample size.
- Huge asymmetry in the probability of the various outcomes increases the required sample size to apply a normal approximation.
Lets now consider how our flop EV calculations perform according to the two metrics above. First note that no matter what two ranges we are estimating the highest postflop EV possible in a 3x raise case is to win the entire 6bb pot and the lowest EV possible is to lose the entire pot and get 0. Thus something like winning thousands of hands worth of EV in a single low probability case is not possible. Furthermore, in this case R2 is a subset of R1 so a player could always chose to fold all the hands in R2 that are not in R1 and play according to the GTO strategy when you hold R1 with the rest of his range.
This would get him the R1 EV with .5222/.5826 of his range and 0 with the rest so a maximum difference between the R1 EV and the R2 EV on any board is actually about 10% of the R1 EV.
Thus on factor 1 we are in extremely good shape, there magnitude of the variation in payoffs across different boards, particularly the magnitude of the difference between the R1 and R2 GTO EVs is bounded and is very small.
For factor two, due to suit symmetries, some strategically relevant flops are more likely than others (eg AAA is less likely than K86tt), however the magnitude of this difference is not huge and we can ignore it by dealing random flops and not bucketing by suit symmetry.
Thus for both of these variables we have good reason to believe that a normal approximation should be quite accurate over a reasonable sample. I chose to use a sample size of 100 which is a bit on the small side simply due to the magnitude of the manual work required.
The result of all this is that we can apply a two population t-test to our EV results which I will go through below.
Data and Results
I've put the raw data that this analysis is based on up on google drive for anyone to view here. All the scenarios I ran where based on a 3x SB open and a BB call with 100BB stacks and an 80% pot bet on every street. The EVs for both players and the nash distance for every scenario is recorded in the google doc.
The data shows that the average postflop EV for the 52.22% range is 2.765 while for the 58.26% range it is 2.714.
This means that the overall EVs for both calling and folding are
For R1: .5222 * (2.765 - 3) + .3638 * -1 = -0.487
For R2: .5826 * (2.714 - 3) + .3034 * -1 = -0.470
This suggests that there is a 1.7 * .886 = 1.5 bb/100 EV gain for calling with the wider range. However, this ignores a key element in preflop analysis which is the rake. For postflop solutions, particularly at mid or high stakes the rake is much less relevant as for example in a 5/10 game with a 3x SB open vs a BB call the rake is already capped and in almost any case you have paid a significant chunk of the rake already and stand to win far more than the potential additional rake you might pay.
However, preflop the rake is more of a concern as we are going from a strategy that paid 0 rake with 6% of hands to a strategy that is now paying significant rake with those hands. Obviously this will cut into the strategy EV gain. The exact magnitude and cap of the rake will depend on what site/stakes you play at and how many people are dealt into the hand. For this analysis I will use Stars 4.5% rake and assume that the entire average postflop EV is raked, Note that this is really worst case as some of our postflop EV comes from winning large pots where the rake would of capped out. The true impact of the rake is difficult to estimate and varies greatly across stakes / number of players dealt into the hand so this should be considered a worst case and for most players the true outcome would be somewhere in-between this case and the rake free case.
The rake changes our EV equations and resulting EVs as follows:
The worst case rake cuts our gain by almost half down to .9bb/100 and as we can see significantly deters otherwise potentially +EV high preflop calling strategies (eg something like an 80-90% defense would likely get hammered by the rake).
Finally I also considered assuming that we have a non-trivial postflop edge above the GTO EVs that would make calling additionally profitable. As I discussed in my BvB strategy pack, many people c-bet poorly in these situations and the OOP player can pretty easily achieve a higher EV than GTO against such an opponent. Assuming a moderate edge of 5bb/100 we can adjust our equations like so.
Without rake we get a 1.7 bb/100 difference:
One final comment on these numbers is that in all the above cases, the standard deviation was about 15% higher with the wider defense range, so the potential EV gains do come at the cost of higher variance. Of course this is to be expected in any strategy where you switch from folding a set of hands (0 variance) to playing out the hands postflop and potentially winning or losing a significant sum.
Anytime you do this type of analysis it is not sufficient to just compare averages.
To determine if these results are statistically significant we can do our two population t-test. I did so using this free calculator. The results suggest that to really be confident that these EV gains are not due to chance that we would need to significantly increase our sample size. The p-value of a test indicates the probability of the resulting increase being due to random chance and in this case the p-values range from ~.2 without the rake to ~.3 with the rake (.233 with no rake/no edge, .193 with no rake/ yes edge, .317 yes rake / no edge, .271 yes rake / yes edge). So at the moment the best we can say is that incorporating the rake there is about a 70% chance that the wider defense range is higher EV.
My plan is to follow up on this analysis once the technology behind the process has improved so that we can revisit these numbers and hopefully get more conclusive results. In the mean time the jury is still out but the evidence weakly suggests that a slight widening of the "standard" defense range would be +EV.
However, preflop the rake is more of a concern as we are going from a strategy that paid 0 rake with 6% of hands to a strategy that is now paying significant rake with those hands. Obviously this will cut into the strategy EV gain. The exact magnitude and cap of the rake will depend on what site/stakes you play at and how many people are dealt into the hand. For this analysis I will use Stars 4.5% rake and assume that the entire average postflop EV is raked, Note that this is really worst case as some of our postflop EV comes from winning large pots where the rake would of capped out. The true impact of the rake is difficult to estimate and varies greatly across stakes / number of players dealt into the hand so this should be considered a worst case and for most players the true outcome would be somewhere in-between this case and the rake free case.
The rake changes our EV equations and resulting EVs as follows:
For R1: .5222 * (2.765 * 0.955 - 3) + .3638 * -1 = -0.551
For R2: .5826 * (2.714 * 0.955 - 3) + .3034 * -1 = -0.541
The worst case rake cuts our gain by almost half down to .9bb/100 and as we can see significantly deters otherwise potentially +EV high preflop calling strategies (eg something like an 80-90% defense would likely get hammered by the rake).
Finally I also considered assuming that we have a non-trivial postflop edge above the GTO EVs that would make calling additionally profitable. As I discussed in my BvB strategy pack, many people c-bet poorly in these situations and the OOP player can pretty easily achieve a higher EV than GTO against such an opponent. Assuming a moderate edge of 5bb/100 we can adjust our equations like so.
Without rake we get a 1.7 bb/100 difference:
For R1: .5222 * (2.765 + 0.05 - 3) + .3638 * -1 = -0.460
For R2: .5826 * (2.714 + 0.05 - 3) + .3034 * -1 = -0.441
With rake we get a 1.2 bb/100 difference:
For R1: .5222 * ((2.765 + 0.05) * .955 - 3) + .3638 * -1 = -0.527
For R2: .5826 * ((2.714 + 0.05) * .955 - 3) + .3034 * -1 = -0.513
One final comment on these numbers is that in all the above cases, the standard deviation was about 15% higher with the wider defense range, so the potential EV gains do come at the cost of higher variance. Of course this is to be expected in any strategy where you switch from folding a set of hands (0 variance) to playing out the hands postflop and potentially winning or losing a significant sum.
Statistical Significance of Results
Anytime you do this type of analysis it is not sufficient to just compare averages.
To determine if these results are statistically significant we can do our two population t-test. I did so using this free calculator. The results suggest that to really be confident that these EV gains are not due to chance that we would need to significantly increase our sample size. The p-value of a test indicates the probability of the resulting increase being due to random chance and in this case the p-values range from ~.2 without the rake to ~.3 with the rake (.233 with no rake/no edge, .193 with no rake/ yes edge, .317 yes rake / no edge, .271 yes rake / yes edge). So at the moment the best we can say is that incorporating the rake there is about a 70% chance that the wider defense range is higher EV.
My plan is to follow up on this analysis once the technology behind the process has improved so that we can revisit these numbers and hopefully get more conclusive results. In the mean time the jury is still out but the evidence weakly suggests that a slight widening of the "standard" defense range would be +EV.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.