Monday, October 20, 2014

GTO Brainteaser #8 Bonus Solution -- Optimal Betsize Calculations

I got a request for a numeric answer to the bonus of GTO Brainteaser # 8 which involves solving for an optimal bet size in a two street bluffing game.  The solution to the non-bonus question is here and is worth reading first.  I haven't actually done a post on deriving optimal betsizing in multistreet play before so I thought it would be useful to demonstrate the mathematics involved.  I'm not going to restate the game structure again here so please check out the original problem statement if you are not familiar with the original game.

The basic technique for calculating optimal bet sizing is as follows.

  1. Rather than using a fixed betsize in your calculations, make the bet size a variable and solve for GTO strategies as a function of that variable
  2. Compute the EV of the game when both players play the GTO strategies as a function of the bet size variable
  3. Maximize the EV of the person making the bet with respect to the bet size variable.  That is your optimal bet size.
While this technique is quite simple conceptually, the actual algebra involved can be hairy so I usually just make wolfram alpha do it.  So lets get started.


To calculate the optimal betsize we will make a few assumptions that are reasonably easy to verify and that I have shown in other posts / videos.

  1. The hero should always bet the nuts on the turn.  This allows him to "compound the nuts" over multiple streets which as I showed here and in more depth in my CardRunners videos is always +EV compared to betting on a single street with a polarized range.
  2. On the river it will always be most profitable to shove with the nuts with our polarized range.  This is quite simple to prove and I showed it in my first CardRunner's video.
  3. The hero's EV with his air on the river will be 0 unless z is so large that it is optimal for the villain to always fold the river.  However, clearly if the villain were always folding in a spot where the hero might hold air on the river, he would never call a turn bet, so any time a turn bet is called, the hero's EV with air must be 0 on the river
Combining observations 1 and 2 we can parameterize our bet sizing strategy with a single number x, the number of chips that we plan to bet on the turn.

If we bet x chips on the turn, then we know we will jam the river and bet the rest of our chips.  Given a starting turn pot of 100 chips if we bet x chips on the turn, the river pot will be 100 + 2x when we are called.

Our river jam will thus be a bet of  (150 - x) into a pot of 100 + 2x.  This means that we will be making a 

b = (150-x) /  (100 + 2x) percentage pot bet on the river.

Now lets call the frequency of the villain calling the turn c.  Observation 4 tell us that since the hero's river EV is 0 with air, his EV for bluffing the turn is very simple to calculate.

EV[turn bluff] = (1-c) * 100 - c * x

Clearly the villain always folding or always calling the turn is highly exploitable so we know his turn calling strategy is mixed and we can apply indifference conditions to see that 

c = 100 / (100 + x)

What about the villains calling frequency on non 3d/2c rivers?  This will of course just be determined by indifference conditions that depend on the pot / bluff size.  Call the villains river calling frequency rc.

EV[river bluff] = (1 - rc) * (100 + 2x) - (150 - x) * rc

Indifference conditions imply that

rc = (2x + 100) / (x + 250)

Now we can write the optimal turn bluffing frequency as a function of the turn bet size as well by looking at the villains EV for calling.  I calculated the villains EV of calling when the bet size was 50 chips in my previous post but I will duplicate the calculation here, assuming that the hero is bluffing with frequency with his air and always betting the nuts.  This means that (1-z) = a / (1 + a) of his betting range is air and z = 1 / (1 + a) of his betting range is the nuts.

Since the hero's EV with air on all rivers is 0, when he bluffs and we call we win the 100 chip pot plus his turn bet size in EV.  When he holds the nuts on 3c/2d runouts our EV is 0 on the river and on other runouts our EV when our opponent holds the nuts .

EV[call] = (100 + x) * (1-z) + z * (2/44 * -x + 42 / 44 * (-x - rc * (150 -x)))

If we apply indifference conditions to say that the EV of a call must be 0, this a relationship between z and x that we can solve for z.

Wolfram Alpha is much better at algebra than me so I just computed that relationship here.

Now the EV of the game for the villain is just how often the hero checks the turn, which is just (1-a) / 2, because by indifference conditions, when the hero bets the villain is indifferent between calling and folding and thus his EV is 0.

Since z = 1 / (1 + a), a = (1/z) - 1, so (1 - a)/2 = (2 - 1/z) / 2

So the EV of the game is (2 - 1/z) / 2 for the villain and we know z as a function of x.  Thus the optimal bet size for the hero is the value of x that minimizes his opponents EV, (2 - 1/z) / 2, where z is between 1/2 and 1 (because our betting range is at least 1/2 nuts and at most 100% nuts.  Since this is clearly decreasing in z, we just need to minimize z.

Again I calculated this using wolfram alpha here.  The result is that the optimal bet size is 52.69 chips.  This intuitively makes sense, as we would expect to bet slightly larger on the turn with some of the river runouts killing our action than we would without that risk.

The EV of this game is ~87.89 so the EV gain by changing betsize in this case is tiny, about 0.02 chips.













Thursday, October 16, 2014

GTO Brainteaser #8 Solution -- Multistreet Theory vs Practice

In this post I'm going to discuss the solution to GTO Brainteaser 8, check it out here if you missed it.  I'm also going to provide some introduction to multistreet theory and some simple examples of understanding the impact of runouts and the flow of information across streets.  A browseable GTORB version of the turn solution is also presented near the bottom for those of you interested in a sneak peak at the GTORB turn solution interface (it still needs some polish).

Solution


The brainteaser involved studying the following game:




  • You are on the turn and the board is  AsAhKsKh
  • The hero a hand range of AcAd and 3c2d
  • The villain a hand range of KcKd
  • The pot is 100 chips and stacks are 150 chips
  • The hero can either bet 50 chips on the turn and 100 on the river or he can shove for 150 on the turn.


  • The goal was to determine how and why this game was different from the nuts vs air multi-street game that we looked at in GTO Brainteaser #6 and from the multi-street polarized vs merged range theory in the mathematics of poker.

    The basic answer to this question is quite simple, the real world game with an actual deck is worse for the hero than the model game from GTO Brainteaser 6, because when a 3c or 2d hits on the river it reveals to the villain that the hero must hold the nuts.  The villain is able to convert this information into money by folding 100% on either of these rivers.  Perfectly polarized ranges are always the strongest possible ranges, so for the hero, having his range depolarized on the some river runouts decreases his EV.

    Note that the villain also gains information when an Ac or Ad comes on the river, but this information is not valuable because our EV when we hold 3c2d is 0 anyways.  A GTO opponent calls enough to make us indifferent between bluffing and folding, so if we are forced to always fold that doesn't actually decrease our EV.  In this case the villain still gains information but has no way to convert it into money.

    We can calculate the exact EV decrease quite simply.  1/2 of the time we hold AcAd and if we recall the solution to GTO Brainteaser #6, our EV with AcAd in this spot without river runouts giving away information is 16/9th of the pot.  Now when we hold AcAd, since we know our opponent holds KcKd, there are 44 river cards that might come and 2 of them reduce our EV to 1.5 pots in the case where our opponent calls our bet.

    Clearly, the villain must still make us indifferent between betting and checking a Q, on the turn which means that he must call 2/3rds of the time to make the EV of betting 0.  Thus the hero EV with an Ace in the new game can by calculated by adding up:

    1. 1/3 * 100 -- we bet and they fold
    2. 2/3 * (42/44 * R)  -- hero bets, villain calls and a non-3c/2d river comes where R is the hero EV on that river
    3. 2/3 * (2/44 * 150) -- hero bets, villain calls, and a 3c or 2d river comes and villain just folds
    On the unblocked rivers, as in Brainteaser #6, the villain must call our bet of 100 chips 2/3 of the time so our ev with an A on the river is R = 150 * 1/3 + 250 * 2/3 = 650 /3.

    Thus our EV with an A is 175.76 chips or 1.7576 pots.  Our EV loss with an Aces is 16/9 * 100 - 175.76 = 2.02 chips.


    Since we hold the nuts half of the time, our overall EV loss is 1.01 chips.  So the EV of the actual game for the hero is about 87.87.  As you can see in the solution browser below, GTORB computes the EV as 87.83 which is within the given margin of error of 0.05 chips.

    How does this EV loss effect optimal play?  Intuitively this is actually pretty simple.  Of course we still always should bet the nuts, and we should bet enough of our air that our opponent is indifferent between calling and folding to our turn bet.  In this game, where our opponent's EV when we hold the nuts is higher, we need a higher nuts to air ratio to maintain his indifference which means that we must bluff the turn less frequently.

    Mathematically figuring out the optimal  bluffing frequency is a bit complex as we need to make sure to properly weight the probability of various river cards coming, using all of the villains information about his opponents range and his own hand.

    The villains EV for calling the turn and then playing GTO on the river is 150 when his opponent holds air (On average he wins the entire river pot of 200, but 50 of those chips were his own).  When his opponent holds the nuts, on 3c or 2d runouts his EV is -50 because he called 50 and the turn and always folds the river.  On other all other runouts he calls the 50 on the turn plus an additional 100 on the river 2/3rds of the time for a total EV of -350/3 (-116.66).

    If the hero is betting x% of his air and all of his nuts then when he bets he holds air x/(1+x) of the time and the nuts 1/(1+x) of the time and given that he holds the nuts, 3c or 2d come 2/44ths of the time.

    And the villains EV for calling a turn bet of 50 is

    150 * x / (1 + x) - 1/(1+x) * (100 * 2/44 + 350/3 * (1-2/44)).

    Setting that equal to 0 and solving for x gives that the hero should bet 25/33 or 75.76% of his air on the turn and 43.1% of his turn betting range should be air which matches the GTORB solution precisely.  The villain still calls 2/3rds of both the turn and the river bet as that is all that is required to make the hero indifferent between bluffing his air and checking it.  This is our exact mathematical equilibrium solution, you can browse the approximate GTORB solution below.





    I also solved the optimal bet sizing bonus question in a separate post here for those who are interested.

    Takeaways


    This example may seem trivial, but as it turns out, the existence of river runouts shift the range distributions of players and transfer information.  Being in a position to put as much money into the pot as possible when you have an informational edge over your opponent or when your equity distribution is polarized  and as little as possible when it is merged is very powerful.

    I'll demo a much more powerful example of this in the next brainteaser where we will see an example of how equity transitions and river information can make protecting your hand via turn bets that never fold out better hands and are never called by worse hands, still be GTO, even if you were required to pay your opponent his turn hand equity when he folds worse.

    Note on epsilon equilibrium:  One quick note on the GTORB solution which is a 0.05 chip (5/10,000ths of the pot) epsilon equilibrium. Due to the approximation techniques used, the GTORB strategy actually has the hero checking the nuts on the turn a tiny fraction of a % of the time.  This has almost no impact on the game EV or solution accuracy but it does mean that if you examine a river after both players check you will see the hero bet with a very low frequency.  This is because the approximate solution has him holding the nuts with a tiny probability.  These rounding errors are why the solution has a nash distance of 0.05 chips which in this case means an opponent who played perfectly could exploit the approximate GTO strategy for 0.05 chips out of the 100 chip pot.

    Thursday, October 9, 2014

    GTO Brainteaser #8 -- Solving the Turn, Theory vs Practice

    The internal alpha version GTORB is now capable of solving turn scenarios for GTO turn and river play so this brainteaser is going to focus on multi-street theory.  In the solution (probably a week from today) I'll post the first fully browse-able GTORB turn solution to the model game below for those of you who are excited play around with a GTO turn strategy.  Note that the version of GTORB that can solve the turn won't be released commercially for a month or two as there are some performance / scalability issues that I need to solve before it is ready for mass use.  It will likely cost extra.


    The problem



    In GTO Brainteaser #6 I looked at a model scenario where the hero had a range of 50% nuts, 50% air while the villain had a range of 100% medium strength hands.  There was a 100 chip pot, 150 chip stacks and two streets of betting.  The hero could either bet 50 chips on the turn and then have the option to bet 100 chips on the river or he could shove the turn for 150 chips, and the question was which option is higher EV and what are GTO strategies for both players in this game.

    The key simplification that made this scenario quite different from real world poker is that it was assumed that no river card was actually dealt, there were just two rounds of betting.

    For those who are curious you can check out the full solution to brainteaser #6 here.  It turns out that it is optimal for the hero to bet 50 chips on the turn with all of his nut hands and 7/9ths of his air and then to bet 100 chips on the river when he is called with all of his nut hands and 3/7ths of his air hands.  The villain calls each of these bets 2/3rds of the time and folds 1/3rd.  The hero wins 8/9ths of the 100 chip pot in EV in this game.  Furthermore, it turns out that betting 50 chips on the turn and 100 chips on the river is the exact optimal bet sizing for the hero to maximize his EV, all other bet sizes are lower EV.

    Lets now look at a very similar game.  Imagine the following (completely made up) scenario.

    1. You are on the turn and the board is  AsAhKsKh
    2. The hero a hand range of AcAd and 3c2d
    3. The villain a hand range of KcKd
    4. The pot is 100 chips and stacks are 150 chips
    5. The hero can either bet 50 chips on the turn and 100 on the river or he can shove for 150 on the turn.

    Clearly no matter what river card comes, the relative strengths of the hands in both players ranges will not change so in that respect this game seems identical to the model game from brainteaser #6.  AcAd will beat KcKd on every possible river and 3c2d will lose to KcKd on every possible river.

    However, it turns out that GTO play in this game is different from GTO play in GTO Brainteaser #6.  Why?

    1. What is the EV of the game for the hero, is it higher or lower?  
    2. What are the optimal strategies in this game and what is the hero's EV when both players play optimally?  

    Bonus:  Is betting half pot on the turn and the river still optimal or is there a higher EV bet size?

    Tuesday, September 2, 2014

    GTO Poker and Multiple Equilibria Part 3

    In part 1and part 2 of this post I described some of the properties of multiple equilibria in zero sum games and explained how different equilibrium solutions can perform differently against various types of sub-optimal players.  The basic idea is that by playing exploitatively against lines that optimal players don't actually use we can remain completely unexploitable while still targeting and attacking leaks in our opponents play.  This is accomplished by just shifting which GTO strategy we are playing at any given time, based on our opponents tendencies.

    Today I'm going to conclude that discussion by going through an example of a simplified poker scenario with two equilibria, each of which performs quite differently against various types of fish.  I described this example in detail at the end of part 2 so I'll just very briefly reiterate the situation here.
    1. The board is: 2sTs9c5h3s
    2. The IP range is: 22, 87, T9, QJ, Ks8s+, As2s-AsJs, 7s6s, 9s8s, Qs9s
    3. The OOP range is: QQ+
    4. There are 100 chips in the pot and 150 left to bet
    As we saw last time, it is never GTO for the OOP player to lead for 50% pot here with any of his range.  However, we're going to consider the performance of two GTO strategies against two types of fish, both of whom are going to randomly lead for half pot with their entire range 10% of the time.  Fish 1 is thinking that "when he shoves here he's never bluffing" and is feeling you out with his bet and plans to fold to a shove 100%.  Fish 2 is thinking "OMG I haz overpair" and is planning to call a shove 100% of the time.

    Our goal was to find two strategies that are GTO (this requires that there is no profitable deviation that would allow a GTO opponent to increase his EV by leading for 50% pot with some hand is his range) but that also extract as much extra value as possible from each type of fish who decides to lead for 50% pot.

    This means that we need to ensure that however we react to a 50% pot lead, the EV for our opponent against that reaction is lower EV than the EV of him playing the GTO strategy for every hand in his range, In this situation it was optimal to always check.  I've included the checking EVs below.

            Hand               % of Range           Check EV
    Q
    Q
    5.56
    14.02
    Q
    Q
    5.56
    14.02
    Q
    Q
    5.56
    16.69
    Q
    Q
    5.56
    14.02
    Q
    Q
    5.56
    16.69
    Q
    Q
    5.56
    16.69
    K
    K
    5.56
    25.87
    K
    K
    5.56
    25.87
    K
    K
    5.56
    30.57
    K
    K
    5.56
    25.87
    K
    K
    5.56
    30.57
    K
    K
    5.56
    30.57
    A
    A
    5.56
    25.87
    A
    A
    5.56
    25.87
    A
    A
    5.56
    41.15
    A
    A
    5.56
    25.87
    A
    A
    5.56
    41.15
    A
    A
    5.56
    41.15






    Now if we were just being maximally exploitative against these fish, we would always shove against the fish who leads with the intention of folding and we would only shove our 2-pair + against the fish who shoves with the intent of calling (otherwise we'd fold), however, doing so might open up the opportunity for an exploitative opponent to attack us.

    For example, if we were to always jam over a lead, an exploitative opponent could bet call QQ and win 73% of the time for a massively profitable deviation.  In general, we won't be able to take maximally exploitative lines, instead we'll need to find strategies that are moderately exploitative while maintaining enough balance to stay GTO.

    Doing this at least approximately is actually relatively straight forwards.  With these ranges, the hand our opponent would most like to deviate with is QQ without the Q of spades as it is his lowest EV hand for checking and all his hands have similar equity vs our range.

    We want to shove as wide a range as possible while keeping our opponents EV for bet calling QQ below 14.02.  All this is quite simple to do in CREV, but first as a reminder here is the GTORB equilibrium solution which we will use as a starting point.






    Some quick tinkering in CREV will show that you can take the GTORB equilibrium strategy which folds about 34.3% of the time to a lead and make it more aggressive by shoving all 87s and only folding 32% of our 87o.  This shifted strategy is still GTO (which can be verified by using CREVs max-exploit button against the shifted strategy and verifying that the BB EV is still 25.87 but it will Jam and pick up $150 chips instead of folding an additional 6.1% of the time for an EV gain of 10 chips against the fish who bet folds.  That's an additional 10% of the pot in the cases where our opponent donks!  We'll call this strat GTOShove.

    Similarly, against the fish who is going to bet call, we can shift the GTORB strategy in the opposite direction by folding more of our range while staying GTO.  It turns out we can fold all our hands that our opponent beats while staying GTO because our range is polarized while our opponents range is condensed so betting is just a weak play.  We'll call this strat GTOFold.  Note that in this case, GTOFold is maximally exploitative in terms of how it responds to a river lead!

    I've put together a chart of the overall strategy vs strategy EVs.  The % exploit is the percentage of the maximal exploitative leak that our strategy extracts from our opponent.  Specifically it is:

    (EV[our strat vs opponent strat] - EV[gto vs gto]) / (EV[max exploit vs opp strat] - EV[gto vs gto])

    Of course all of our strategies are unexploitable so I didn't include our exploitability in the chart.  When calculating the maximally exploitative strategies, I only considered exploiting our opponents in response to their river lead, I did not consider altering our strategy at all when responding to a check.


    Our Strategy               Opponent Strategy                  Our EV                        % Exploit
         GTORB                            GTORB                              74.1                              N/A
        GTOFold                           GTORB                              74.1                              N/A
       GTOShove                         GTORB                              74.1                              N/A

        GTORB                             B/C Fish                            75.4                              27.3%
       GTOFold                            B/C Fish                            78.8                              100%

         GTORB                            B/F Fish                             77.4                              43.1%
       GTOShove                         B/F Fish                             78.0                              51.3%


    This is one of the simplest examples of how GTO theory can be combined with exploitative play to create strategies that are impossible for our opponents to counter, but that still allow us to adapt or strategy to specifically target weaknesses that we identify in our opponents.  People often consider GTO play as a purely passive style where you never adjust to how your opponents play in any way but in reality there are a variety of ways to adapt and attack your opponents while remaining completely unexploitable (as shown here), or while using GTO concepts to find the absolute least exploitability strategy that can achieve a specific win-rate against a specific opponent or while making it so that we are only exploitable in ways that we don't think our opponent will capitalize on.  I'll be discussing all of these in more depth over the coming months.

    Tuesday, August 26, 2014

    GTO Brainteaser #7 -- Solution to second GTO quiz

    Today I'll walk through solutions to all the problems in the second GTO quiz.  Overall people did much better on the second quiz than they did on the first quiz, whether that is because it was easier or because you all have a better understanding of GTO than they did prior to taking my first quiz I can't say for sure, but I'm hoping its the latter :)

    There were some requests for aggregate stats so I'll present those briefly for both quizzes before I dive into the solutions.

    Overall people did much better on these true of false questions then they did on the first quiz where overall people got 48.12% of the questions right which is comparable to randomly guessing.  On the second quiz overall 67.1% percent of answers were correct.  Question 2 was the question that gave people the most trouble with a correct answer rate of about 58%,

    Solutions


    Question 1:  "In a game of HUNLHE, player A is out of position and player B is in position. A strategy pair where on some river A checks his entire range and folds 60% of the time to a pot sized bet and B checks back some of his range cannot be a GTO set of strategies."

    Answer:  False

    There is a false perception in the poker community that makes people think that we need to call at least 50% of the time to prevent our opponent from "auto-profiting" by betting any two cards, and that if we somehow fail to do so our opponent would bet his entire range at us.

    This is only true when our opponents range is such that his entire bluffing range has 0 EV vs our checking range.  In practice, even on the river this is very rare and on the flop it is virtually never the case.  Because we don't need our opponent to be indifferent between open folding and bluffing, but rather indifferent between checking back and bluffing, it is entirely possible that at the point of indifference the EV of checking back is 20% of the pot (ie the very top of his bluffing range beats our worst air, eg the bottom 20% of our checking range) and thus that it is fine for us to fold 60%.

    It is very easy to construct example ranges like this, all you need to do is to give player A a range that is 80% medium strength hands and 20% air, and player B a polarized range, where the weakest hand in B's range still beats A's air.

    You can browse such a model example here: http://gtorangebuilder.com/#share_scenarioHash=cd0fbe72dac92c1361cab6aab6dfc56b

    In practice these types of situations are very common with real ranges and if you are blindly thinking that you need to defend 50% of the time in these spots you are likely making a major error.  With most of his range your opponent will average winning some small % of the pot.  This is fine and if you are too focused on making sure they actually average 0 chips with their worst hands (particularly on the flop) you'll usually end up being taken to value town by the stronger parts of their range.

    Question 2:  "In a game of HUNLHE, player A is out of position and player B is in position. A strategy pair where preflop A folds 60% of the time to a minraise and B folds some of his preflop range cannot be a GTO set of strategies."

    Answer:  True

    This question is completely different from the last one because in this case the EV of folding a hand for player A is in fact -0.5bb regardless of what hand he holds.  Folding and checking are fundamentally different actions.

    If Player A was ever folding a hand preflop and Player B was folding to a minraise 60% of the time then Player A could increase his EV by minraising the hand he was folding because even if he always loses his minraise when Player B defends, his EV for raising is .4 * -2bb + .6 * 1bb = -.2bb.  This is greater than the -0.5bb from folding.

    Question 3:  "You are playing a game of HUNLHE against an opponent who is not playing GTO and who is playing a fixed strategy on the turn (they are not adapting their strategy based on your style of play) but who will play GTO in every river scenario. You are also playing a non-GTO strategy on the turn (but GTO on the river) in order to exploit this opponent.

    You determine that in some turn situation the EV of betting with JdTd is higher than the EV of checking with JdTd. A friend tells you that even though betting JdTd is higher EV in isolation that you can increase your overall strategy EV against your opponent by checking back JdTd to "balance your checking range". Your friend may be correct."

    Answer:  True
      
    The key concept to understand here is that the idea that GTO play requires you to take the maximally profitable line with every hand in isolation is a local concept that only holds at equilibrium.  If both players are playing non-GTO on the turn and then exploiting each other on the river then it is entirely possible that some action on the turn that reduces our EV against the way our opponent is playing now, would increase our EV overall by strengthening our river ranges enough that our EV with our other hands would increase by more than make up for the EV we might lose with JdTd.

    Its pretty easy to find simple examples of this kind of situation, but I'll probably save this topic for its own blog post as it is a reasonably large and important topic.


    Question 4:  "You are asked to examine two poker bots (Bot 1 and Bot 2) to determine if they are GTO. You have them play each other in a rake free game of HUNHLE and over a billion hands they are so close to break even that there is no statistical evidence that one is better than the other. You then have them each play a billion hands against two non-GTO players (Fish 1 and Fish 2). You find that Bot 1 beats Fish 1 at a much higher rate than Bot 2 beats Fish 1 but that Bot 2 beats Fish 2 at a much higher rate than Bot 1 beats Fish 2. This proves that at least one of the two bots is not GTO."

    Answer:  False

    It is entirely possible for different GTO strategies to perform differently against different types of fish.  I've explained this in detail in this post.

    Question 5:  "Zero-sum 2 player games can have two equilibria where both players have identical EVs in each equilibrium but where one equilibrium is higher variance than the other.."

    Answer:  True

    All equilibrium in zero sum 2 player games have to have the same EV but they can have different variance.

    A simple example is consider the game where Player 1 can choose to either play Rock Paper Scissors against Player 2 where the winner pays the loser a dollar or he can choose not to play and they both get 0 dollars.

    Player 1 choosing to play RPS and both players playing GTO is an equilibrium with 0 EV for each player and Player 1 choosing not to play RPS is an equilibrium with 0 EV for each player but choosing to play RPS is a higher variance choice than a guaranteed EV of 0.

    Some people like to weight payoffs with concave functions called utility functions to model the fact that people tend to avoid variance.

    Question 6:  "You are playing a bluffing game against a random opponent that comes from an infinite population of players. Half of the population are regulars, that is thinking adaptive players like you. The other half are fish who play fixed, suboptimal strategies and never adapt. The game is on an anonymous site so you have no idea if your opponent is a regular or a fish. Similarly your opponent has no idea if you are a regular or a fish.

    The game works as follows. Both players ante $50. A coin is determines which player is the bluffer. The bluffer is dealt a card from a 2 card deck, which contains one Ace and one Queen. The caller is dealt a King. The bluffer can look at his card and choose to either bet $100 or check. The caller can call or fold if bet to, otherwise he must check. At showdown the high card wins.

    All regulars know that fish, call 60% of the time as the caller and bluff 25% of the time they are dealt a Queen as the bluffer (they always bet an Ace).

    What is the GTO strategy for the regulars given this population dynamic? What is the average profit per round for a regular? If the site were not anonymous and every regular was aware of whether their opponent was a regular or a fish how would that change the average profit per round?"

    Answer:  Regulars cannot profit in this game.  Read below for details...

    While a naive approach to this problem will get the right answer in this particular, to be thorough when approaching these types of problems in general remember you always need to "be sure it aint pure" and check for pure strategy solutions.  As we'll see in this case pure strategies aren't relevant.

    In reality this problem is extremely similar to the RPS puzzle that I posted in Brainteaser #1, where by adopting a pure strategy in one dimension (never playing scissors) a regular could profit against an unknown opponent who is a fish who always plays rock 50% of the time, so if you haven't look at the problem before, definitely check it out as an example where blindly applying indifference conditions can lead you astray.

    It turns out that in this scenario the regulars, by competing to extract EV from one another whenever an exploitative line is taken, completely protect the fish from losing and at equilibrium both fish and regulars will break even in this game.

    As a reminder, the solution to the version of this game were there are just two players (instead of a population of fish and regulars) is for the betting player to always bet with an Ace and to bet half the time with a Queen and for the calling player to call half the time.  I'll call this strategy the base GTO solution which is a unique equilibrium to the base bluffing game that I illustrate in my first cardrunners video.

    The only way indifference conditions can possibly be satisfied is if on average the population is playing the base GTO strategy, which breaks even against all opponents, so it should come as no surprise that if the regulars are playing a mixed strategy then they cannot possibly profit.  Furthermore, if the average population is playing the unique equilibrium strategy, we don't need to check for pure strategy solutions, because we already know that the base game does not have any.

    So we can assume that the solution involves regulars playing mixed strategies both as the bettor and as the caller.  For a refresher on mixed strategies and indifference conditions check out this youtube video.

    Regulars must be indifferent between bluffing and checking back a Queen.  Lets call c the frequency with which regulars call.  The EV of checking is 0.  For the EV of bluffing to also be 0 for indifference conditions, a bluff must be called half the time since we lose 100 chips when called and win 100 chips when they fold.  Thus if fish are calling 60%, regulars must call 40% so that on average our opponents call 50%.

    Similarly, regulars must be indifferent between calling and folding a King.    Since the EV of folding is 0, the EV of calling and beating a bluff is 200 and the EV of calling and losing to a value bet is -100, this means that on average, when an opponent bets they need to have an Ace two thirds of the time.

    Remember that when our opponent bets that gives us information about whether or not they are a fish, as regulars are more likely to bet than fish, so after we observe a bet the probability that our opponent is a regular goes up so we will need to apply conditional probability to account for this.

    Fish have an Ace 4/5th of the time and overall are betting 5/8ths of the time.

    If regulars bet 75% of the time with a Q then they have an Ace 4/7th of the time and overall are betting 7/8th of the time.

    If we observe our opponent bet the probability that they hold an Ace is

    (4/7 * 7/8 + 4/5 * 5/8) / (5/8 + 7/8) = 2/3, exactly what we need for indifference between calling and folding.

    Thus regulars should bluff 75% of the time when they have a Q.

    However, what happens if we actually play this strategy as a regular?  We just determined that with a Q we are indifferent between checking and folding so our EV as the bettor when dealt a Q is 0.

    We also just determined that with a K we are indifferent between calling and folding when bet to so our EV with a K when bet to is 0.  Our opponents on average check 25% of the time so our EV with a King over all is 25.

    Since on average our opponents call 50% of the time our EV with an A is 150.

    If we add these up, our EV is ((150 + 0) / 2 + 25) / 2 = 50.  This is exactly how much we have to ante to play the game, so the strategy breaks even.

    The takeaway here is that the anonymity completely protects the fish from being beaten in this simplified game.  Note that a key aspect of this result is that in this simplified bluffing game GTO play break even against fish which is not the case in a full game of poker so this question is not designed to illustrate that something like zoom on bovada is unbeatable in any sense.  It is just designed to show that in general, in a population with fish, regulars will usually end up taking lines that are non-GTO in the opposite direction of the fish which will balance out average play and make it less exploitable.

    Its easy to see that if we could identify whether our opponent was a regular or a fish we would be able to profit significantly, ($4.375 per hand on average).  In this case anonymity completely destroys that edge.

    Tuesday, August 19, 2014

    GTO Brainteaser #7 -- True or False Quiz Pt 2

    This weeks brainteaser is another true or false quiz on GTO concepts plus one theory probably.  Getting them all right is only part of the goal, ideally you should be confident that you could convince a skeptical friend of the correct answer to each question.   Good luck :)



    Tuesday, July 29, 2014

    GTO Poker and Multiple Equilibria Part 2

    Today I'm going to continue examining GTO play in games with multiple equilibria.  I'm going to focus this post entirely on zero sum two player games and we'll take a look at various GTO strategies and how they perform against different types of fish.

    If you haven't already, be sure to check out part 1 of this post where we observed that in two player zero sum games when both players play GTO strategies their EVs are the same no matter which equilibrium strategy they play but that against fish some GTO strategies performed better than others.

    In the previous post we looked at a river situation with multiple GTO strategies one of which performed as well or better against all types of suboptimal play than all other GTO strategies.  Today I'm going to start by showing a very simple example of a game with multiple GTO strategies where every strategy performs better against one type suboptiaml player and worse against another type, such that there is no "best" GTO strategy.

    The key take away that I want to convey is that contrary to popular belief, it is entirely possible to alter your strategy in an exploitative fashion to take advantage of your opponent while continuing to play GTO in all sorts of zero sum games including poker.

    Contrived Example: AB Game


    I'm going to start by introducing a totally made up game that has no bearing in reality.  As an actual game, it is quite boring, but it actually does a very good job of illustrating the general concept of how a game might have multiple GTO strategies each of which is better against a different type of fish.

    The game works as follows.  Each player privately chooses one of four options, a, A, b, or B.  They then simultaneously reveal their choice and a winner is determined.  The loser must pay the winner a dollar, if there is a tie no money is exchanged  The options are ranked as follows:  A and a both tie b and B, however A beats a and B beats b.

    From examining this game it should be quite clear what the set of GTO strategies is.  In all cases, playing A is as good or better than playing a because if your opponent plays b or B you tie either way, but if he plays A you lose when you play a instead of tying.  Because the exact some logic holds for B, it cannot be GTO to ever play either a or b if a GTO player would ever play A or B which he would.

    It turns out that any strategy that always plays either A or B with some frequency is GTO in this game, so always playing A is GTO, always playing B is GTO and in general playing A x% and B (1-x)% is GTO for all x in [0,1].  This is easy to verify so I won't check it here.  When two GTO players play each other they always tie so their EVs are 0.

    Now lets think about performance against suboptimal play.  Suppose there are 2 fish, Fish a and Fish b.  Fish a always plays a and Fish b always plays b.  The GTO strategy that always plays A, wins every time against Fish a and breaks even against Fish b.  The GTO strategy that always plays B wins every time against Fish b and breaks even against Fish a.  Furthermore, these GTO strategies are actually maximally exploitative against the fish they exploit.

    The result of this is that there is room to be exploitative and adapt your strategy to your opponent while still remaining GTO.  Imagine you are actually playing 100 rounds of this game against a random opponent from a pool of fishy opponents, some of whom play various GTO strategies, some of whom play a more often than they play b and others who do the opposite.

    It would be 100% reasonable (and far more profitable than picking a specific GTO strategy and sticking to it) to have a default GTO strategy that you usually play, say always playing A, and deviating to an alternate GTO strategy as soon as you saw that your opponent played b more than me played a.

    Note that in this case the behavior that we are adapting exploitatively is on the equilibrium path and when we play against a GTO opponent our strategy change will be observable to them.  We are actually switching which equilibrium strategy we are playing.  As we'll see below, there is another way to exploit opponent tendencies while remaining GTO that involves only altering our play when our opponent takes actions that are off the equilibrium path.

    A Poker Example -- GTO Wiggle Room


    I'm going to try and look at a somewhat real world scenario that might emerge in a HU game in a 3-bet pot on a draw heavy board.  I kept the ranges unrealistically small for simplicity and wasn't careful to precisely model accurate stack sizes because this example is designed to be illustrative of a broadly applicable concept, not to accurately address a specific situation.

    Imagine we're on the river, on a draw heavy board where the river card completed the flush.  The OOP players range consists of strong over pairs, while the IP players range consists of busted sraight draws, made flush draws, and a hand full of sets and two pairs.

    Specifically:

    1. The board is: 2sTs9c5h3s
    2. The IP range is: 22, 87, T9, QJ, Ks8s+, As2s-AsJs, 7s6s, 9s8s, Qs9s
    3. The OOP range is: QQ+
    4. There are 100 chips in the pot and 150 left to bet
    You can view GTO play for this scenario below:



    One of the first things we can note about GTO play is that the OOP player should never bet half pot when his opponent is playing GTO.  What this means is that if we run into a sub-optimal player who does bet half pot at us we have some "wiggle room" where we can adjust our strategy a bit to exploit the type of range that we think he is betting half pot with. We still remaining GTO after the exploitative adjustment so long as we don't adjust our strategy so much that a GTO player would be able to increase his profit by exploiting our adjustment by betting half pot at us with some range.

    Because in this type of situation, betting from OOP is just a fundamentally weak play (similar to playing little a or litle b in the game above), there are generally going to be many different reactions to such a bet that will still be low enough EV for a best responding opponent that from his perspective, checking is still more profitable for them than betting, even if our response to the bet is unbalanced.

    Specifically lets consider two potential fish types, both of whom are going to randomly lead for half pot with their entire range 10% of the time.  Fish 1 is thinking that "when he shoves here he's never bluffing" and is feeling you out with his bet and plans to fold to a shove 100%.  Fish 2 is thinking "OMG I haz overpair" and is planning to call a shove 100%.

    The question I'm going to look at in part 3 is this: is there enough wiggle room that we can have two significantly different and more profitable strategies that exploit Fish 1 and Fish 2 respectively, but that are still both unexploitable enough  when our opponent leads out for 1/2 pot to be GTO?

    Stay tuned... Hopefully part 3 of this post will be out in the next week or two :)