Friday, January 13, 2017

Brains vs AI: My Prediction and some Tips for the brains

My skype has been inundated with questions and prediction requests regarding the ongoing brains vs AI matchup so I thought I'd take some time to write down my official prediction and to also help point the brains in the right direction for beating the bot.

For those of you who haven't been following it, after the brains defeated Claudico in the last major human vs bot challenge, the latest AI from CMU/Alberta is named Liberatus and he is back to play 120,000 hands vs Dong Kim, Jason Les, Jimmy Chou, and Daniel Mcaulay.  Furthermore, after two days of play (~8k of the 120k hands) the AI is up against 3 of the 4 players and significantly (1500bbs) overall.

As a result the betting lines have moved such that the AI is now favored to win the whole thing after starting out as a 4-1 dog.  I'm going to boldly go on record as saying that the betting lines are wrong, the humans will stage a comeback, and the AI will not win this year.  All that is under the assumption that the humans actively look for leaks not just in its ranges but in its reactions to bet sizing.  If they just play their standard game they will likely lose.  I'll give some specific advice on how to attack the AI below.

For what its worth I think the technology to make a human level HUNLHE bot is there, but that it involves combining a lot of state of the art techonology in just the right way and I don't believe the researchers will get it right this try.  My medium to longer term outlook for the future of humanity in HUNLHE is very bleak.

How to beat a GTO bot

GTO bots are generally constructed around the principal of taking a set of pre-computed GTO solutions and then interpolating them (often with some learning component) to figure out how to react to bet sizing that is outside of the pre-computed game tree.  As far as I know the details of Liberatus' specific algorithms have not been released so I'll have to make some assumptions about the general construction of GTO bots.  Deepstack, a cutting edge bot that recently made some questionable claims about "beating" human professionals, has detailed more of their architecture in a published paper so I am basing some of this analysis on their approach.

Because of the way GTO bots are constructed, if you play within the precomputed GTO solutions bet sizing abstraction you are guaranteed to lose.  When HU limit hold'em  was solved it directly implicated that any version of NLHE which was restricted to a small number of "fixed" sizes, even if they are percentages of the pot rather than fixed amounts, was also solvable.  Anyone with a bit of programming experience and a budget could go to SPF, buy some preflop solution, and trivially make a GTO bot that would be unbeatable if you agreed in advance to only ever bet some specific pot %s, eg 50% or 100% pot postflop, always 3x, limp or fold pre, always 3-bet to 9, etc.

The only way to attack the bot is going to be to attack its bet sizing abstraction.  The difficult and to date unsolved part of building an unbeatable poker bot comes entirely from correctly determining how to react to bet sizings outside of its abstracted solutions.  Note that by the definition of GTO strategies you cannot play within its bet sizing abstraction but with non-standard ranges preflop and on the flop and then hope to somehow exploit it on the turn and river unless you can somehow go outside its bet sizing abstraction in a systematic exploitative way on those later streets.  Understanding that the only way to beat the bot is to attack its abstractions is the first key step.

Liberatus seems to be taking things a step further than the naive approach I suggested above, by resolving the turn and river dynamically during a hand, presumably with a large number of bet sizes.  This adaptation allows the bot to play a preflop/flop strategy that may be based on a GTO computation that only had 2 turn and river sizes, but then resolve the turn and river with a much larger set of bet sizings on the fly.  What this addition means is that if you play a strategy such that reaching the turn with a range that is close to what the preflop and flop components of the bots solution dictate, then you are likely already screwed.  It will be able to solve a very large version of that turn/river branch of the game tree with a large number of bet sizes and your ability to attack it on the turn and river will be very limited IF you play within its bet sizing abstraction preflop and on the flop.  

Thus the key to beating the bot is to find holes in the preflop and flop bet sizing abstraction.  In particular, one should look for weak reactions to non-standard 3-bet sizes and 4 bet sizes as a primary means of attack.  Flop check raises may be vulnerable as well.  The tricky part to this is doing so with a sensible range.  

I'm going to illustrate how you would attack a bot by using non-standard 3-bet sizing as an example.  This all assumes that one has unlimited time and unlimited resources which of course the brains in this challenge do not.  That said, a reasonable approach would be to do the following.

  1. Get a HUNL GTO preflop solution with the sizes the bot seems to use itself
  2. Run a few HUNL GTO preflop simulations with unusual 3-bet sizes, pick one that performs well even against a perfect response
  3. See if the bot ever shows down a hand that should be in the range of the solution from step 1 but should not be in the range of the solution from step 2
  4. If so you've found a leak
  5. Take the reaction to a 3-bet from step 1 and lock that strategy in to the solution you chose from step 2
  6. Observe what a minimally exploitative strategy is
  7. Keep an eye on what you observe in terms of the bots reaction ranges to your non-standard 3-bet size.  Its reaction strategy may be interpolated from two GTO solutions with bet sizes near your 3-bet size (eg if you 3-bet to 7 it might interpolate between a 3-bet to 5 and a 3-bet to 9) or it might be using some learning algorithm to try and reduce its mistakes over time, or they might be updating it at night
I think that if the brains use the next 10-20k hands to test the bots reactions to unusual preflop and flop sizes in situations where the odd sizing is only slightly inefficient to start with that they will be able to find some wholes that they can attack for the remainder of the match.

If they can consistently reach the turn in spots where the bots estimate of the GTO range for them (and it) to hold at that point is significantly wrong, then its dynamically solving will only lead it astray as it will input incorrect starting ranges and thus output an incorrect strategy.  The key is just to get outside of its bet sizing abstraction early in the hand were it has to be more sparse.

Despite the bad early start for the brains, I still think that it is unlikely that Liberatus is unexploitable and that assuming it is attempting to play near GTO then the brains, given time should be able to find those leaks and attack them without fear of counter exploitation.  As long as the brains realize that their "standard game" isn't sufficient and take a focused and structured approach to identifying leaks that the bot has as a result of its bet sizing abstraction and attacking them I think there is still hope for humanity.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.