Thursday, July 17, 2014

GTO Poker and Multiple Equilibria Part 1

Today I'm goint to take a look at games with multiple equilibria (which includes poker) and discuss the implications of multiple equilibria on GTO strategies.  Because this is a large topic I'll be breaking this discussion into two separate posts.

Understanding games with multiple equilibria is important, because in practice most reasonably complex games have more than one equilibrium and in many cases some equilibria are superior to others.  Because of this, understanding the spectrum of equilibria and their properties in depth is an important part of game theory.

Coordination Games


Coordination games are the simplest examples of games with multiple equilibria.  I'll walk through a simple example that is sometimes called the "swerving game".

Imagine you are speeding along a narrow country road.  You come over a steep hill and realize another car is coming straight at you and you are both driving in the center of the road.  Furthermore, you are both going fast enough that there is no time to break.  If you both swerve to the right you will pass each other and avoid an accident.  Similarly if you both swerve to the left there will be no accident.  In both these cases your payoff is $0.  However, if you swerve to your right and the other driver swerves to his left, or if you swerve to your left and the other driver swerves to his right you will crash and both incur $1,000 of expenses.



The Swerve Game


It turns out that this game actually has three equilibria.  To see this just recall the definition of a nash equilibrium.  Suppose both players strategy is to swerve to their right.  Clearly neither player would want to deviate as that would cause a crash so the strategy pair where both players always go right is an equilibrium.  Similarly if both players always swerve to their left that is an equilibrium as well.  Both of these equilibrium outcomes give each player $0 in EV.

The third equilibrium is more subtle.  Imagine playing this game against a player who randomly goes left half the time or right half the time.  In this case no matter what strategy you employ, your EV is -$500, so all strategies are a best response to your opponents actions.  This means that both drivers swerving right 50% and left 50% is also an equilibrium, however the EV of this equilibrium is much worse for both players.  On average both players lose $500 in EV if they play this equilibrium.

Obviously in this situation everyone would be much better off if they could coordinate and agree in advance that any time they encounter such a situation they will pick one of the pure equilibrium strategies (always go right or always go left) and always play it.

In this game always going right or always going left are both "GTO" strategies in the sense that in isolation they are something a person playing optimally might do, however an equilibrium is a condition on the set of strategies that both players are using, not on a specific players strategy in isolation.  If you were in a country where people usually drive on the right, for example, such that society had coordinated on a specific equilibrium, deciding to play the strategy where you always go left would still be "GTO" technically, but in practice it would be a bad idea.

Furthermore, if you ended up in a world where somehow everyone was playing the mixed strategy equilibrium that would be terrible for everyone, even though theoretically speaking everyone would be playing GTO.  Some theorists even argue that social and cultural norms often emerge to solve coordination problems by establishing an implicit agreement on which equilibria to play.

In general, most complex games have many equilibrium and some may be better for a particular player than another at his opponents expense, or some may just be better or worse for everyone as in the swerving game.  There are a variety of ways of categorizing and classifying equilibria and trying to predict which ones are more likely to be played in practice than others (for example in the swerving game the mixed strategy equilibrium is not evolutionarily stable and thus is not something I'd expect to ever encounter in real life) but overall differentiating "good" equilibrium from bad ones and trying to predict which equilibrium outcomes will occur is a very complex problem. 

Zero Sum Games to the Rescue (sort of)


In zero sum two player games (like heads up poker situations) life is a bit simpler thanks to a key theorem.  In two player zero sum games, while their may be many different equilibrium strategies, it is the case that when both players play equilibrium strategies their EV is the same, no matter which equilibrium strategy either player chooses.  Thus if a single player is playing a GTO strategy in isolation his EV is guaranteed no matter what his opponent does.  There are no good or bad equilibrium for either player if both players stick to GTO play.

However, there still are many situations where certain GTO strategies perform better against all non-optimal opponents or against specific non-optimal opponents.  To see this, lets start by looking at a simple example of a slight variant of the [0,1] game from The Mathematics of Poker (which is a variant of a much older game solved by von Neumann and Morgenstern).  The game works as follows:

  1. There is a pot of 100 chips and both players have 100 chip stacks
  2. Both players are dealt a single card which has a number randomly chosen from between 0 and 1 (inclusive)
  3. Player 1 can bet 100% pot or check.  Player 2 can call or fold if player 1 bets, if player 1 checks then he must check back
  4. Higher numbered cards win at showdown

I walk through how to solve this game in detail in The Theory of Winning Part 2 so I will  just gloss over the solution to the game in this post.

It turns out that GTO play for the betting player is to bet the top of his range for value (all hands stronger than some value v) and the bottom of his range as a bluff (all hands worse than some value b).  The calling player than calls with the top of his range (all hands stronger than some value v).



It is relatively straight forwards to solve for the exact optimal values which are shown below.


This is a GTO solution to this game, and in this case I think it is quite clear that this is the best GTO solution, however, it turns out that while there is a unique GTO strategy for the betting player, there are infinitely many GTO strategies for the caller in this game and thus infinitely many equilibria.

To see why, just consider Player 2's decision when he holds a hand between 1/9 and 7/9 and his opponent bets.  Against a GTO opponent these hands all have exactly the same EV for both calling and folding.  Furthermore, a value bet is only more profitable than a check for Player 1 with a hand of strength x if the x beats at least half of the opponents calling range.

Suppose that rather than calling with all hands better than 5/9, the Player 2 instead calls with all hands better than 6/9 as well as hands between 4/9 and 5/9 and folds everything else.  Player 2's EV is exactly the same in this situation and for Player 1, any hand of strength less than v still looses to more than half of the callers calling range so there is no incentive for him to alter his value betting.  Furthermore, Player 2 is calling and beating bluffs with the exact some frequency so there is no reason for Player 1 to alter his betting strategy.

Thus we have another GTO strategy for Player 2, and it is easy to see that infinitely many similar strategies that shift the bottom of Player 2's calling range are all also GTO.

Now we know that against a GTO opponent both of these GTO strategies for player 2 have equal EV, but what about against a sub-optimal player 1 who value bets too wide with all hands stronger than 5/9 (instead of the GTO cutoff of 7/9) and bluffs according to GTO with hands less than 1/9.  Call him SubOptimal1.

The EVs are equal in all cases, except when SubOptimal1 has a hand between 5/9 and 6/9 and player 2 has a hand between 4/9 and 6/9.  In this case the GTO strategy where player 2 calls with hands 5/9 and better ends up calling half the time and winning half the time when he calls so his EV in this situation is one quarter of the pot, or 25 chips.

The alternate player 2 GTO strategy that calls with 4/9-5/9 and folds 5/9-6/9 ends up calling half the time and always losing a pot sized bet when he calls for an EV of -50 chips.  That's a 75 chip EV difference for player 2 between these two GTO strategies against SubOptimal1.

The probability of the situation where SubOptimal1 has a hand between 5/9 and 6/9 and player 2 has a hand between 4/9 and 6/9 occurring is 1/9 * 2/9 = 2/81 so the overall EV difference between the two strategies is 150/81.

Despite the fact that both strategies are GTO and must perform the same against an optimal opponent, against sub-optimal opponents different GTO strategies can perform quite differently.

In this case one GTO strategy is clearly much better than the other as there is no reason to ever call with a weaker hand over a stronger hand if raising is not an option.  The GTO strategy that calls with all hands better than 5/9 will have an equal or higher EV against all possible opponent strategies.  However as we'll see in part 2 of this post (coming soon) there are plenty of situations where there are two or more equilibrium strategies all of which are better at exploiting certain types of fish and worse at exploiting other types of fish.

Stay tuned for part 2...






Wednesday, July 2, 2014

GTO Brainteaser #6 Solution: Barreling and Bluffing

Today I'm going to go through the solution to the multistreet barreling / bluffing game that I introduced in the 6th brainteaser.  Learning how to solve this game illustrates most of the key concepts of multistreet GTO theory (these concepts also apply to single streets with multiple rounds of betting/raising), but there are a bunch of technical details around the concept of subgame perfection and backwards induction that I am going to gloss over here as they aren't super relevant.  I go through this game in more depth in one of my CardRunners videos "The Theory of Winning Part 3" which should be released sometime in the next few weeks.


The game was structured as follows:

There are 2 players on the turn, and the pot has 100 chips, effective stacks are 150 chips.  The Hero has a range that contains 50% nuts and 50% air and is out of position.  The Villain has a range that contains 100% medium strength hands that beat the Hero's air hands and lose to his nut hands.

For simplicity, assume that the river card will never improve either players hand.  You can also assume that the Hero is first to act (it turns out this doesn't actually matter).

The key question was, what is the EV of betting 50 chips on the turn and then having to option to bet 100 chips on the river with optimal ranges if our opponent calls optimally.  How does that compare to the EV of shoving the turn for 150 chips if the hero shoves an optimal range and the villain calls optimally.

In this post I am going to go through the math a bit quickly because I am mostly focused on demonstrating the technique of using backwards induction to solve these types of games.  For those interested, Keepitsimple had some great questions around clarifying some of the math in the comments which I answered in detail.

Backwards Induction



The basic way to solve these types of games is to work your way backwards.  You first want to solve the river with an arbitrary hand range and then that will tell you the EV for actions on the turn that put you into any possible river state.  I explained the concept of backwards induction here.  The basic idea is that you solve the later stages of the game (in this case the river) and determine the EV of playing that game when both players play optimally.  You then consider reaching that part of the game tree  as just immediately giving you the fixed EV payoff of optimal play.

What makes applying backwards induction directly here a bit tricky is that the distribution of nuts to air that the betting player will end up with on the river depends on the turn strategy he employs and his EV of the river game depends on his hand distribution at the start of the river.  This is similar to how the distribution of hands that you raise determines the starting distribution of hands that you have when considering a 4-bet.

What this means is that we have to solve a parameterized version of the river game that will tell us the EV of reaching the river with an arbitrary hand distribution before we can backwards induct to the turn.


Solving the River


Solving this river component of this game is quite simple.  Lets start by writing out the EV equations for the betting player when he holds air after bluffing the turn.  Call c, the frequency with which his opponent is calling a bet.

EV[bluff] = 200 * (1-c) - 100c
EV[give up] = 0

Indifference conditions require that any mixed strategy equilibrium has the same EV for bluffing and giving up.  Which implies:

200 * (1-c) - 100c = 0
c = 2/3

Similarly lets write the EV for the calling player.  Call a the probability that the betting player is bluffing when he bets.

EV[call] = 300 a + -100(1-a)
EV[fold] = 0

Again, a mixed strategy equilibrium requires these are equal which means:

a = 1/4

Clearly the betting player should always bet the nuts, and then he should bet enough air such that 1/4th of his betting range is air.  The calling player should call 2/3rds of the time.  There are two pure strategy equilibrium cases.  One occurs when the bettor has no nuts in his range, in which place he should never bluff and the calling player should always call.  The other occurs when more than 3/4ths of the bettors range is the nuts, in which case he always bets and the calling player should always fold.

Given these strategies, what is the EV of reaching the river for each player?

For the calling player he is indifferent between calling and folding when facing a bet so his EV is 0 when he is bet to as that is the EV of folding.  When his opponent checks and gives up with his air the calling player always wins the pot, so his overall EV for the river game is just the pot times how often he is checked to.  Call x the portion of the betting players range that is the nuts at the start of the river.  The actual value of x will be driven by the optimal turn strategies.  As we saw above, the betting player will always bet the nuts and will bet enough air that 1/4th of his betting range is air.  This means he will bet x + x/3 = 4x/3 of the time for x <= 3/4 and always for x > 3/4.

Thus, the calling players EV is then 200 * (1 - 4x/3) when x < 3/4 and 0 otherwise.

The betting players EV for reaching the river with a range that contains x nuts is then 200 * 4x/3 when x < 3/4, and 200 otherwise since the EVs must sum to the pot.

Back to the Turn


Using the results above we can now write out the EV equations for various actions on the turn.  Lets start with the calling player.  The EV of calling a 50 chip turn bet when our opponent is betting a range that is z% nuts is our expectation on the river minus the cost of a call:

EV[call] = 200(1 - 4z/3) - 50
EV[fold] = 0

These are equal when z = 9/16.  If the bettor always bets the nuts then he should bet his air 7/9ths on the turn because we start with 1/2 nuts, 1/2 air and (1/2) / (1/2 + 1/2 * 7/9) = 9/16 .  This means that on the river, the betting player has the nuts x = 9/16 which is less than 3/4 and puts us in our mixed equilibrium case.

For the betting player, his EV when betting with air is very simple because the EV of reaching the river with air is 0 when x < 3/4.  Call c the probability that the calling player calls.

EV[bluff] = 100 * (1-c)  - 50c
EV[give up] = 0

So again indifference requires that c = 2/3.

Our Solution


To summarize our GTO strategies, the betting player always bets the nuts on both streets.  On the turn he bets his air with probability 7/9, and on the river he bets with his air with probability 3/7 (since 9/16ths of his range is the nuts which he always bets and we want him to bet a range that is 1/4th air we need 3/7*7/16 to get 3/16 air).  The calling player calls in all situations with probability 2/3.

When both players follow these strategies the EV for the betting player is simple to calculate

EV[when we have air] = 0 by indifference
EV[when we have the nuts] = 100 * 1/3 (they fold the turn) + 150 * (2/3 * 1/3) (they call the turn and fold the river + 250 * (2/3 * 2/3) (they call the turn and the river) = 1600/9

Overall the betting players EV is thus 1/2(0) + 1/2 (1600/9) = 800/9

And since the game is 0 sum the calling players EV is 100/9

Along the way we assumed that the betting player always bets the nuts.  Were he to check the nuts it would be optimal for him to shove the river with his nuts and some % of his air.  As we'll see next, his EV with the nuts is lower in the game where he shoves, so deviating to checking the nuts is not profitable and this is an equilibrium.

What about shoving the turn


If we were to bet 150 on the turn lets write the EV equations for the betting and the calling player, again calling c the probability that the calling player calls, and a the probability that the betting player is bluffing when he bets.

EV[calling] = 250 * a - 150 * (1-a)
EV[folding] = 0

Indifference requires that a = 150/400 = 3/7

EV[betting air] = 100 (1-c) - 150c
EV[giving up] = 0 

Indifference reques that c = 100/250 = 2/5

The betting players EV when both players play these strategies is again simple:

EV[air] = indifference
EV[nuts] = 100 * (3/5) + 250 (2/5) = 160

Overall EV = 1/2(0) + 1/2(80) = 80

So betting half pot twice is 800/9 - 80 = 80/9 chips higher in EV than jamming.

Where does this extra EV come from?


The extra EV comes from an effect that I call "Compounding the nuts".  The basic idea is that every time we bet an optimal polarized range our opponent has to treat the air component of that range as if it were nuts.  The air in that range is effectively converted to the nuts.  If we are on the turn and know that when we reach the river some of our range will be converted to the nuts, we can play the turn as though that portion of our range already is the nuts and thus bluff with a higher frequency.

This is a very powerful idea and its effect is that in situations like this where we have a polarized range, every additional street that we can use to bet increases our EV by letting us bluff wider and wider.

For those of you who are interested, I go over the concept of compounding the nuts in detail in my CardRunners video "The Theory of Winning Part 3" which should be released soon.

Friday, June 6, 2014

GTO Brainteaser #6 -- Barreling and Bluffing

This weeks brainteaser is about optimal bluffing and bet sizing when barreling with a polarized range.  Due to work constraints I won't post the solution until around 7/5/2014.

There are 2 players on the turn, and the pot has 100 chips.  The Hero has a range that contains 50% nuts and 50% air and is out of position.  The Villain has a range that contains 100% medium strength hands that beat the Hero's air hands and lose to his nut hands.

For simplicity, assume that the river card will never improve either players hand.  You can also assume that the Hero is first to act (it turns out this doesn't actually matter).

If the Villain perfectly counters whatever betting strategy the Hero uses, which of the following two options is more profitable for the hero and what is the EV of each strategy?

  1. Shove the turn (for 150 chips) with an optimal ratio of value bets and bluffs and check/give up with hands that we don't bet.
  2. Sometimes bet 50 chips on the turn and sometimes barrel for 100 chips on the river, both with an optimal ratio of value bets and bluff.
Hint:  It is not +EV for the Villain to raise the turn against strategy 2 so long as the hero bets the nuts so you only need to consider strategies where the villain calls/folds with varying frequencies.

Bonus 1:  In general if the effect stacks at the start of the turn are X, what are the optimal turn/river bet-sizing and bluffing frequencies.

Bonus 2:  Suppose that 10% of the time the Hero's air improves to be the nuts and resolve the game.


Friday, May 23, 2014

Theory of Winning Pt 1 -- Free Video on Cardrunners this weekend

Thanks largely to VodkaHaze from reddit I got an opportunity to make a Game Theory focused video series for CardRunners.  The video is free to anyone for this weekend only so definitely check it out here:

http://www.cardrunners.com/poker-videos/the-theory-of-winning-part-1-asuth/

In the video I go through GTO fundamentals, solve some example games, and expand on some of the topics I've covered regarding strategies and GTO in 3+ handed scenarios.

Tuesday, May 13, 2014

GTO Brainteaser #5 Solution: Bayesian Werewolves

This weeks solution is in video form:





The one aspect of the solution that is not covered in the video is the EV of the trial for an individual judge.

When all judges are playing the symmetric equilibrium strategy, the EV for a single judge is -102 when the accused is a human and 922 when the accused is a werewolf. If on average the accused is a werewolf 50% of the time then the EV of the trial is 410.

For comparison, if there were a single judge deciding honestly based solely on his own ritual result he would average 880 when the accused is a werewolf and -20 when then the accused is a human for an average EV of 430.

Even when all the judges have identical incentives, unanimity rule reduces their expectation.

Friday, May 9, 2014

GTO Brainteaser #5 -- Bayesian Werewolves

When a citizen of the ancient city of Bayes is accused of being a werewolf they are brought before the Tribunal to be considered for execution.  The three Tribunal members can detect if someone is a werewolf or not through a simple spiritual ritual involving steamed badger milk.  Since the ritual is only 90% accurate (yes badger milk is actually 90% effective for werewolf detection), each Tribunal member performs it separately, in secret and then decides to vote guilty or innocent.  The accused is only executed if the Tribunal unanimously votes guilty.  The wise Tribunal members are unbiased and go into each trial believing that there is a 50%/50% chance that the accused is a werewolf prior to conducting their ritual, however the members base their vote 100% on strategic self interest.


  1. If the citizen is executed and reverts to wolf-form upon death, they were a werewolf and the Tribunal is given the accused’s possessions for their wisdom and public service.
  2. If they do not turn into a wolf upon death, the Tribunal has executed an innocent citizen, and must each pay the citizen’s family a grievance fee. The family also gets a cake that says “Sorry guys, our bad, #sorrynotsorry”.
  3. When the Tribunal sets a non-werewolf free, usually not much happens.  The people of Bayes eat some discarded apology cake, get drunk, and think of how it might be fun to accuse other people of being werewolves.
  4. If the Tribunal lets a werewolf go free, this is revealed when they turn into a wolf upon death (often at the hands of a cake-filled, drunken mob under a full moon).  As a penance, the Tribunal pays the werewolf’s family the grievance fee and gets none of the werewolf’s possessions.


Imagine that you are one of the members of the Tribunal of Bayes presiding over the fate of one of the richest men in town.  If you all vote to execute him, and are correct, you will each get 1000 gold coins (which buys a lot of cake in Bayes).  If you unanimously execute him, and he was innocent, or if you let him go, and he is later found to be a werewolf, you must each pay a 200 gold coin fee.  If you correctly set him free, you gain/lose nothing.

What is an equilibrium (GTO) strategy for voting based on the result of the ritual?  What is the expected value in gold coins for each tribunal member when they all follow the equilibrium voting strategy, and what is the probability that they convict an innocent citizen or that they release a guilty citizen?  How would these numbers change if the Tribunal used majority rule rather than requiring unanimity?

EDIT: To clarify the efficacy of the badger milk ritual. It is 90% accurate in both directions. That is, if the accused is a werewolf there is a 10% chance the badger milk ritual will say that he is not, and if the accused is not a werewolf there is a 10% chance that the badger milk ritual will say that he is.

You can check out the full solution here.

Monday, May 5, 2014

GTO Brainteaser #4 -- GTO True or False Solution

Today I'm going to walk through the solutions to the GTO True or False quiz.  As I warned in the post, the quiz was quite hard, in aggregate the overall % of questions answered correctly was about 52%, just slightly better than randomly guessing.  In case you missed the quiz you can take it here.


Question 1:  "Betting on the river with a hand in a situation where a GTO opponent never calls with a worse hand, and never folds with a better hand cannot be part of a GTO Strategy in Heads Up NLHE"

Answer:  False

Overall this is one of the trickiest questions, although there is a very simple situation in which the statement is clearly false.  If you imagine you are on the river and the board has a royal flush, then shoving is clearly GTO, and a GTO opponent will never fold worse or call with better.  Shout out to reddit user yellowstuff for noting that.

There is also a more interesting set of examples where betting in this type of situation is profitable.

The one thing, besides making your opponent call with worse or fold better that a bet can accomplish is that it can limit your opponents bet sizing options. To some extent, it turns out that something like blocking bets can be GTO which is quite surprising.

In spots where you have a bluff catcher, but your range also includes the nuts reasonably often it can be GTO to lead small some % of the time with nuts, air, and bluff catchers. By making it a lot more expensive for your opponent to bluff at you (by raising), you can get more value out of your nuts when they raise, make them fold to a tiny bet with your air when they fold, and make them unable to use the most profitable bet size against a check/call with a bluff catcher.

The Mathematics of Poker talks about this in the AKQ game #5 where they solve a full no-limit version of the AKQ game and demonstrate that the GTO strategy for the out of position player is to occasionally bet his kings. I think it's one of the most interesting parts of the whole book; they call it a preemptive bet. They actually work out the math in detail and its worth checking out, but the intuition is what I laid out above.


Question 2:  "Two players, both playing GTO strategies, are playing two hands of heads up in a rake free game of NLHE. Player 2 has a leak, where every time he is supposed to take an action with probability 100% according to his strategy, he instead takes that action 99% of the time, and randomly chooses another action 1% of the time. Player 2 will have EV < 0 vs. his opponent."

Answer:  True


This one is pretty simple if you consider the types of errors that Player 2 will make.  For example, Player 2 will fold AA preflop 1% of the time as the first to act player.  This is a significantly -EV decision.  If both players play GTO for 2 hands, then Player 2's EV would be 0.  By definition of a Nash Equilibrium, none of his random errors can increase his EV, and some of them, (like folding the nuts) will be strictly minus EV, so his overall EV will be strictly less than 0.

Question 3:  "In a 3 handed game of NLHE with no rake, two players are playing GTO strategies, the third player is not. The third player must have EV <= 0 and the GTO players must both have EV >= 0."

Answer:  False


I explained this in depth here.  This was the question that people most frequently got wrong.

Question 4:  "If two players reach the river with ranges that have 50% equity and they both play GTO strategies on the river, then the player who is in position cannot have a lower EV than his opponent."

Answer:  False


The easiest way to solve this one is to imagine Player 1 is in position has a range that is 100% medium strength hands and Player 2 has a range that is 50% nuts, 50% air.  As long as Player 2 doesn't fold the nuts he is guaranteed to win at least 50% and have equal EV and if he can ever make is opponent fold to a bluff, or call a value bet then he will win more than his opponent.  A very simple strategy like betting the pot whenever he has the nuts and 50% of the time when he has air is GTO and guarantees him an EV of triple his opponent.

This is discussed in more depth here.

Question 5: "Suppose we solve a specific river scenario (with pencil and paper, or with a program) for a GTO strategy. A friend shows us a strategy that claims to be GTO for the full game of HUNLHE. If it plays differently in our river scenario then it cannot be GTO."

Answer: False

This question gets into the idea of off equilibrium path behaviors.  A pair of GTO strategies define something called an equilibrium path, which is the set of situations that will occur with non-zero probability when the two GTO strategies play against each other.

The definition of Nash Equilibrium requires that there is no profitable way for either player to deviate off of the equilibrium path and increase his overall EV.  It does not require that the GTO strategy plays perfectly off of the equilibrium path.

As a simple example, suppose that it is not GTO to get to the river with 27o in some specific situation S in the game of HUNLHE as a whole.  Then it is entirely possible that a strategy that is GTO in the entire game of NLHE will play quite suboptimally in the situation S against a player who does hold 27o.

Question 6:  "Two bots playing a shove or fold preflop game with 1000 BB stacks. You observe that Player 2 is calling Player 1's shoves less than 0.1% of the time over an infinitely large sample. These bots are not playing GTO shove/fold strategies."

Answer: False

One might think that if our opponent is calling less than 1 in 1000, and we win 1.5BB when we shove and he folds, then shoving any two cards would auto profit.

However, the equilibrium to this game is to only shove with AA and to only call with AA.  Due to card removal effects, the odds that Player 2 has AA, given that Player 1 has AA, is less than 1 in 1000. Were someone to try and shove a different hand (say KK) they would get called 1 in 221, which would make the shove unprofitable.

In general you cannot just use an observed calling frequency to determine the profitability of a bet.  You have to consider the conditional probability that your opponent will call your bet, given the cards you hold.