## Friday, April 4, 2014

### GTO Brain Teaser #1: Exploitation and Counter-Exploitation in Rock Paper Scissors

This weeks brainteaser involves a modified version of Rock Paper Scissors (RPS) that has some interesting implications for poker strategy, and the general concept of exploiting opponents by taking advantage of an opponents weaknesses.

This game is a "toy game" in that it is a simplified model game that we can study to gain some intuition into the way that bigger games (like poker) work.

### Modified RPS

Consider the standard game of Rock Paper Scissors with the following twist.  At the start of each round an independent judge flips a fair coin and tells your opponent the result but does not tell you.

If the coin came up heads your opponent must play rock.  Otherwise he can play whatever he wants.  You can always play whatever you want and standard RPS rules apply (paper beats rock, which beats scissors, which beats paper).  Your opponent is a smart thinking player and will adapt perfectly to whatever strategy you play.

We're going to look at the following two questions.

1. If the loser of the game must pay the winner \$100, what is the most you should be willing pay to play it?  In the event of a tie no money is exchanged.
2. What is the GTO strategy for both players?

### Maximally exploitative play doesn't work

A naive approach would be to note that whatever our opponents strategy, he will be playing rock at least half the time.  The Maximally Exploitative strategy against anyone in RPS is to always play the thing that beats what our opponent plays the most.

So in this case we would always play paper.  However, a clever opponent might expect this and would always play scissors when he was not forced to play rock by the coin flip.

This would result in us winning when the coin was heads with paper vs rock, and losing when the coin was tails with paper versus scissors for an average of \$0.  Despite our opponent's handicap this strategy fails to profit at all.

Is there any way to actually profit on average against a smart opponent in this game?  It should be easy, he's at a huge disadvantage!

### Bonus

This problem is a bit harder, but also of interest.

You are going to play two rounds of rock paper scissors against the same opponent.  This time, there is no coin flip, but instead, the rule is that your opponent must play rock in at least one of the two rounds.  The loser of each round must pay the winner \$50.  What is the most you should be willing to pay to play the game and what is the optimal strategy for both players?

### Solution

You can see an in-depth solution to both this problem and the bonus  here: http://blog.gtorangebuilder.com/2014/04/gto-brainteaser-1-solution.html

1. If the opponent has performed the same mental process as us, and arrives at the conclusion that he should always pick Scissors on a tails to defeat the naive solution, then could we also flip a coin each time? On heads, pick Paper, on tails pick Rock? If my (very out of practice) math is right, that should give us a 50% chance to win (P/R, R/S) with a 25% chance to draw (R/R) and a 25% chance to lose (P/S).

1. This is a good first intuitive step, but, if your opponent thinks one step further ahead, he can actually match your strategy exactly and play paper every time his coin comes up heads.

In this case his strategy and your strategy are identical and will just break even against each other.

2. Play paper the first time. Then play rock twice. Now your opponent's decision becomes a lot harder and you can play paper again 2-3 times and mix in rock ~1/3, 1/4 times on average.

1. This intuition is extremely good.

Rather than thinking of playing repeatedly, imagine playing a mixed strategy: http://en.wikipedia.org/wiki/Strategy_(game_theory)#Mixed_strategy and that should let you drill down on exactly the best frequency to mix in rock.

2. Thanks,

Good point, but I think it's important to note that this solution doesn't start immediately. The optimal frequency doesn't start immediately. For who knows how long, there won't even be a best frequency.

The only reason that there is a best frequency is because at equilibrium villain won't care between choosing scissors and paper. ...but the first X throws of the game, villain WILL care. And in the beginning villain always throws scissors when given the choice. I should probably modify my answer to change the first throw to rock. R, P, R, R, R, R, P, R, P, P, R, P looks like a good levelly solution.

My point is that things don't truly reach equilibrium for ..possibly quite a long time. A smart villain will keep trying to predict and counter your moves, and if you can guess how you would counter your own moves and then counter that, then you will perform better than at equilibrium (better than blindly doing 2/3 P, 1/3 R).

At some point V will just say fuck it and randomize between paper and scissors and at this point you can go ahead and start using the best R/P frequency.

3. Good points!

This depends on if you think of the villain as learning and adapting as he plays (what you are describing sounds a lot like fictious play (http://en.wikipedia.org/wiki/Fictitious_play), or if you imagine him carefully analyzing the game before he starts playing and then playing the equilibrium immediately, or if you imagine him as expecting you to play sub-optimally at first and for him to be trying to take advantage of you.

Game theory solutions tend to be focused just on finding the equilibrium and playing it immediately, but of course in practice, if you are smarter than your opponents that can be sub-optimal, (but if your opponent out thinks you and you don't play the equilibrium you'll lose out).

If you anticipate that your opponent is a level 2 thinker (he'll think that you are going to play paper and thus will counter with scissors 100% at first until he starts to learn that that leaves him vulnerable to you playing rock) then what you are saying makes perfect sense.

However, if he is actually a level ahead of you and knows that you think that he is likely to counter first with scissors, and thus that you are going to play rock a lot more than 1/3rd then he can play paper and exploit you, so you're basically just in a guessing game of trying to guess what level your opponent is on and what level he thinks you are on, and if you are confident that you are smarter than him you can likely out think him in that guessing game for a little while before he adapts to equilibrium. If he out thinks you, then you'll perform worse than the equilibrium strategy until you start playing it.

4. One other point of interest, is that most poker players actually take the opposite approach to guessing what level your opponent is on at the start and then zeroing in to equilibrium over time.

Instead they try and start at equilibrium play and get a sense of any weaknesses their opponent might have and then they deviate from the equilibrium in later rounds to attack any weaknesses that they identified.

5. "Instead they try and start at equilibrium play and get a sense of any weaknesses their opponent might have and then they deviate from the equilibrium in later rounds to attack any weaknesses that they identified."

Just thought about my answer again and I once again don't like it. I proposed way too much rock throws, but even this aside my thinking was on a level that doesn't really make sense. I don't know why I both assumed that villain was smart and that villain knows nothing about us. (since we somehow know something about him)

Simply playing GTO immediately might be the best, but it would really depend on the pool of players and what kind of assumptions you can make based on where the game is being held.

6. How do these comments not have an edit option?

Simply playing GTO immediately might be the best, but it would really depend on the pool of players and what kind of assumptions you can make.

FMP.

7. Agree completely.

I'm using the built in blogger (by google) software and for some reason they don't have editing, its really weird and frustrating, sorry.

3. Flip a coin yourself. If heads choose rock, tails choose paper.

Remember that rock-rock is a push.

1. This doesn't work. He can play the exact same strategy (play paper whenever his coin comes up tails). If you are both playing the same strategy then you'll break even against him.

So no profit.

2. Small adjustment. Bias your rolls toward paper. Roll a four-sided die and pick rock only on a 4.

So you automatically win 3/8 of the battles and draw another 1/8 (to account for the 50% he must choose rock.) The other 50% of the time the problem is essentially reversed -- he knows you've got a 75% chance of picking paper. If he plays scissors all the (non-rock) time then you still win another 1/8 and lose 3/8. If he plays paper all the time he wins 1/8 and draws the remaining 3/8.

Giving the following overall outcomes:
- a) You win 50%, lose 37.5% and draw 12.5% (his scissors)
- b) You win 37.5%, lose 12.5% and draw 50% (his paper)

In both cases, he loses to you. Since our strategy is purely random, he cannot (on a long enough average) do better than those extremes regardless of his strategy. This is of course assuming we're counting your wins against his wins rather than your wins against the total throws (which implies draws are counted against you.)

I believe any number between 0% and 50% (exclusive) for your rock throws will bias the game in your favor. 0% rock throws will lead to 50-50-0 (he always throws scissors.) 50% rock throws will lead to 25-25-50 (he always throws paper.) Any number in between will lead to your win, with the opponent only really being able to choose between a larger number of his wins (though never enough to beat you) vs a smaller gap. If you throw 1/3 rock, then the gap is fixed at 16.7% (1/6) no matter what he does. If you want the math then for your rock frequency f and his scissor frequency s (of non-rock throws,) then your wins will be w=(1-f-sf)/2 and the gap will be (1-s-2f+3sf)/2. All other throws for both parties are assumed to be paper. If for some reason you throw rock more than 50% of the time (or you throw scissors ever,) you lose the advantage and your opponent can beat you.

At some level though to be truly optimal (discounting bluffs and other psychological non-strategy victories,) you need to either have a random strategy that's tilted in your favor (as outlined,) or you need to be able to adjust your strategy to compensate each time he matches your current strategy and that's rarely allowed as a possibility in these type of thought experiments.

4. This comment has been removed by the author.

5. Over the long term, the strategy must converge to stable, therefore true random can be the only optimized strategy.

50% of time opponent must play R. The remaining 50% of the time they can equally choose R,P,S.

R -> P = 1/2 + 1/3 * 1/2
S -> R = 1/3 * 1/2
P -> S = 1/3 * 1/2

For a guaranteed win, roll a die: 1-4 => P, 5 => R, 6 => S.

By how much? Consider you are random as above and opponent is fixed wlog at 100% R. You win 2/3 of the time and lose 1/6 of the time.

The expected payoff to play is 2/3, the expected cost is 1/6. Payoff - cost = 1/2, so the most you should be willing to pay to play a \$100 game is \$150.

1. This has some incorrect thinking in it. The strategy is not optimal (because it involves playing scissors which cannot possibly profit).

The payoff calculation is also wrong, when you play a round of the game, absolute best case you win \$100, so if you are paying \$150 to play a round you would lose \$50 per round even if you won ever single game.

2. Thank you for the taking time to post a response. However, ...

1) Your claim that playing scissors "cannot possibly profit" is clearly wrong. Plain and simple. I can't imagine anyone even attempting to prove it given the ease of concocting a counter example.

2) You are changing the rules from "pay up front for the game plus pay per loss" to a simple "pay per trial". My payoff reasoning is fine for the former, which is what is given in the problem statement.

Given that you are being deceptive, participation is not enjoyable. Please consider this as one data point for your experiment.

3. Not to be rude, anonymous, but you're very clearly out of your depth. The very first intuition you should have is that playing scissors cannot possibly ever make sense. Your opponent *must* play rock half the time and rock beats scissors. That means you must lose at least 50% of the time you play scissors. So that can never make sense.

4. Joel you're incorrect on this one. The problem clearly states your opponent is a smart thinking player. If you never play scissors, you opponent will always play scissors when he has the choice and you will not profit. Therefore you *must* play scissors some percentage of the time.

5. Your explanation was not wlog. Using your logic choose paper 100%. Then wlog fix R at 100%. You win every game! Wow, Clever!

Lets look at a better strategy against yours. Suppose your opponent chooses Scissors and Rock just as often. Then we have as the expected number of wins the following: .5(2/3 -1/6) + .5(-2/3 + 1/6) = 0 when using your strategy. You break even. If you never play scissors, this works out better.

6. Depends on how many rounds of the game you have agreed to in advance. The larger the number, the easier it is to win. Let's say you agree to play 10 rounds.

Throw paper all the time for the 1st 5 times, which will make it seem like you're pursuing a maximization strategy. Then randomly, throw rock once.

1. If your opponent is capable of adjusting to your strategy optimally then the number of rounds doesn't matter.

For more details see here: http://en.wikipedia.org/wiki/Nash_equilibrium

7. The above mistakenly conflates "The remaining 50% of the time they can equally choose R,P,S" to mean "they *will* equally choose R, P,S". In the above analysis, you say you'll be playing P 2/3rds of the time. If that's the case, the opponent would be a fool to play S as rarely as you say. Half the time, they'll play R. The other half the time, they'll play in a manner that exploits how often you play P, not merely randomly choose between R/P/S.

8. Flip a coin yourself. If it comes up heads, throw paper. If it comes up tails, throw a random selection of all three possibilities. While your opponent may quickly catch on to the fact that you're throwing paper half the time, he won't be able to counter this, as your random throws will be completely unpredictable, and he's still confined to a losing position 25% of the time, which then becomes your advantage.

In the bonus equation:
Your opponent's best strategy would be to throw his mandatory rock first, so that the second throw becomes unpredictable. He can then throw whatever he wants on the second throw (It could be completely random, including perhaps throwing a second rock) as it would be impossible to tell what he's going to throw because the second throw becomes the first "true" match.

Playing against this opponent, I would assume that he's going to throw rock first given the above, though if he threw scissor in anticipation of my strategy, it's easy to win the second match and call a draw. If he does indeed throw rock first as I suggest, the second game is, again, a toss up, as it's really the first "true" match.

1. And as to what you would pay, if you're playing against an opponent that must throw rock once, with my above strategy, the most you could lose is \$0, as it either ends as a draw (you pay each other \$50 having each lost one round), or you win \$100.

2. You've got some good ideas in here, but this isn't quite right.

An easy way to see this cannot be optimal is that your strategy would involve throwing scissors occasionally. You can clearly see that ever throwing scissors is sub-optimal because he is throwing rock at least 50% in total so scissors can not possibly profit on average.

Bonus is also not correct.

3. Modified, but with the "random throws" consisting only of rock and paper? That way you're throwing 75% paper, and 25% rock, while your opponent can only possibly throw 50% scissors.

4. close, this gets you a profit, but to get the maximum profit you need to find the exact right ratio of paper to rock. Its not 75%/25% but you are getting very close.

9. I am having trouble reconciling with the question asked. Opponent must throw rock 50% of the time. On a finite number of games, how does this work? Lets say I* get to pick the number of games N = 10. That means my opponent must throw rock 5 times. On each round, the actual win/loss pertcentage changes because I know this additional fact. On round 10 if rock had only been thrown 4 times I will win with 100% change.

Let's assume that I don't know N = ?. How does this even work? You need to explain this part. Does the game end when I choose? Then it is the same as the previous scenario, when my opponent chooses? I.E he can only end the game on an even round if he has thrown rock 50% but otherwise keep playing? I can know that this won't be the last round and if I playout a 50/50 strategy those rounds won't affect my winnings. This might drag the game out forever meaning it falls under N = infinity, or else I will always know if the game can end on "this" round giving me an advantage. Or is it random? End on any turn when my opponent has thrown rock 50% of the time, since now it won't benefit my opponent (they can't not throw rock to make the game end on this round on purpose), this must put be in a better situation then when he got to choose, therefore still going to win out.

Lastly, if there are N = infinity games. No I don't play the same game for all of time, who wants to do that, plus this is uninteresting, because since the game does not end, there is no requirement for my opponent to ever throw rock, and thus no strategy can be developed. Since the sum probability of my opponents rock throwing rounds can't be determined, this can't be a scenario either.

Since each of the possible game situations always leaves me with a benefit of knowing when the game is going to end, either strongly (I chose) or weakly (I didn't choose), I still know how many times he has to throw rock. But it still comes down to what does throw rock 50% of the time mean? Is it just against me or in every gave he plays against any opponent altogether.

1. You can think of it as a single round. A judge is going to flip a coin just once and tell the result to your opponent and not you. If the coin comes up heads then your opponent must play rock in that single round, or he automatically loses. If the coin comes up tails his play is unrestricted.

His strategy (and yours) can involve using a computer to generate a random number between 0 and 1 and using that number to determine what to do, so people are allowed to play "Mixed Strategies".

http://en.wikipedia.org/wiki/Mixed_Strategy#Mixed_strategy

10. The probabilities for each toss are independent from that of the last toss, so the optimal strategy is just based on probabilities.

So the long-term strategy is, regarding its probabilities, indistinguishable from picking the move winning over the last move of one's opponent.

So if our own probabilities are r, s, (1-r-s), the probabilities of the other are 0.5+(s*0.5), 0.5*(1-r-s), 0.5*r.

The payoff is r*(0.5*(1-r-s)-0.5*r)+s*(0.5*r-(0.5+s*0.5))+(1-r-s)*((0.5+(s*0.5))-0.5*(1-r-s)). A relative extremum will be found for both by differentiating to s, and r, and equating with zero.

We find r=1/3, s=0. That means we'll play rock 1/3 of the time, and paper 2/3 of the time. Our opponent will play rock exactly half of the time (that is, he will not inconvenience himself by playing rock even more often), and will play scissors 2/3 of the time he is allowed to, and paper 1/3 of the time he is allowed to.

1. exactly right :)

2. Actually, if my analysis is right, when we play 1/3 rock, 2/3 paper our enemy will (as you stated) never play rock if he does not have to; but how much he plays paper and how much scissors in these cases does not matter. The terms cancel out and our net payout is 1/6.

Disclaimer, I could always have made a sing mistake on the way ;).

3. This is somewhat true, but if he makes a mistake by playing scissors at a frequency that is not exactly 2/3rds then we can actually increase our winnings above 1/6th. For example, were he to always play scissors we could always play rock and win at 1/4th.

So the strategy of us playing 1/3rd rock, 2/3rds paper, guarantees us winning 1/6th, but if our opponent makes a mistake and plays anything other than what anonymous posted, then we will be able to alter our strategy to win even more.

4. "the probabilities of the other are 0.5+(s*0.5), 0.5*(1-r-s), 0.5*r"

This statement is wrong, but it works out since s is 0. It should be:

"the probabilities of the other are max[0.5, s], (1-max[0.5, s])/(1-s)*(1-r-s), (1-max[0.5, s])/(1-s)*r"

11. Regarding the bonus qustion:

The bonus question is that you play two rounds, and your oppoent must play atleast rock once. So, if he plays something not rock in the first round he must play rock in the second and loss (you just play paper). If he plays rock in the first he can play 1/3 all in the second (which leads to a draw like normal). Thus, if he plays rock first it is like normal RPS (because he get 0 in the next). Otherwise you get one free win (for the second round).

Thus, we can model the first game of the bonus question as (where the numbers is the number of rounds he wins on avg given the choice in round 1):
R P S
R 0 -1 1
P 0 -1 -2
S -2 0 -1

Where you pick columns and him rows. We see that rock dominates paper for the row player. We get

R P S
R 0 -1 1
S -2 0 -1

For the column player, the choice of rock now dominates scissors. We get

R P
R 0 -1
S -2 0

Playing rock 1/3 and paper 2/3 for the collumn player gives -2/3 wins on avg. Similarly, the row player can get -2/3 wins on avg by playing rock 2/3 and scissors 1/3.

1. Note that the number of wins is negative because it is the number of wins the restricted player gets.

You should be willing to bet upto to 100/3

2. This works! Very nice. There are multiple equilibrium solutions to the bonus and this is one of them.

3. You worked it out, but this is essentially the strategy I came up with-- and I actually decided I'd prefer to be the one who was forced to play rock 50% of the time and throw rock in the first round expecting a loss, but giving me the freedom in the second round, to which id probably play rock again and toss the perceived freedom in hopes the opponent modifies his or her strategy.

12. The key to finding the solution is to think about optimizing your opponents score, then choosing your own RPS probabilities to make it such that your opponent's EV cannot improve. In mathematical terms, this means calculating the expected value function, taking the partial derivative with respect to the opponent's probability of throwing rock, paper and scissors. In both cases, the answer is the same: you throw rock 1/3 of the time and paper 2/3 of the time. In the case of the second game, you should be willing to pay up to \$33.33 to play against such a player.

13. I am surprised the winning strategy has not occurred already, it is so simple.

1/ wait for judge to toss coin and show it to your opponent.
2/ When they either pick rock or other 50:50.
3/ Punch them out and steel all their money.
4/ Optionally share 50:50 with the judge to keep quiet.
Simple human logic, it is far easier to outwit an opponent if they are willing to follow arbitrary rules...

14. Let me play randomly with probabilities given by
P(R) = a
P(P) = b
P(S) = 1 - a - b
with a, b in [0, 1] and a + b <= 1
and let my opponent play with
P(R) = 1/2 + x
P(P) = y
P(S) = 1/2 - x - y
with x, y in [0, 1/2] and x + y <= 1/2

I get that my advantage (wins minus losses) is
A = (1/2)a - 3ay + 3bx - x + y

The opponent seeks to minimize my advantage. Since my advantage is described by a plane in (x,y) if I assume a and b are some constants that I choose to define my strategy, extremal values should be given by the vertices of the triangle within which my opponent can choose their values of x and y (bounded by (0, 0), (0, 1/2), and (1/2, 0)). At these points, my advantage is:
A(0, 0) = (1/2)a
A(1/2, 0) = (1/2)a - 1/2 + (3/2)b
A(0, 1/2) = 1/2 - a

A smart opponent will determine my values of a and b and choose an appropriate strategy to minimize my advantage; comparing the first and third vertices, I must choose a = 1/3, which maximizes my advantage at A = 1/6, and considering the second vertex under a = 1/3, I must choose b >= 1/3 to get A >= 1/6; but which b is not important because my opponent will choose a strategy with x = 0. So my best strategy is to play with a = 1/3 and b in [1/3, 2/3], which gets me A = 1/6.

In words, the opponent will play all voluntary throws as scissors when I play rock less than 1/3 of the time and paper more than 1/3 of the time, they will play rock when I play both paper less than 1/3 of the time and rock less than (2/3 minus the frequency I play paper), and they will play paper when I play both rock more than 1/3 of the time and paper more than (2/3 minus the frequency I play rock). There is a degenerate point: If I throw randomly (a = b = 1/3), then the opponent is also free to throw randomly for their voluntary throws (x = y = 1/6).

My probable income per game is I = 100A - F, where F is the fee I will pay for the privilege of playing. I break even at F = 100/6 = 16.66(6 repeating), therefore I should pay no more than \$16.66 per game is I want to make any money at it.

Does that sound right? I'm interested to hear how this might apply to poker...

1. This is all right. I'm going to do a video on various ways to solve it Monday by popular request. I'll try to also make a text post here with some ways the result can apply to poker :)

15. We need to choose a strategy that will win given any strategy that our opponent chooses.

Given that our opponent must play rock at least 50% of the time, paper can never be a worse choice for us than scissors, since paper must win for us at least 50% of the time, whereas scissors can win no more than 50% of the time. Consequently if we had a strategy where we chose with probabilities x% rock, y% paper, and z% scissors, our strategy would be made no worse by assigning x% rock and (y+z)% paper. Thus we can eliminate scissors from our consideration of optimal strategies against a player forced to choose rock at least half the time.

Moreover we must choose rock less than half the time, because otherwise our opponent is free to exactly match our strategy and break even. But if we choose paper all the time, then our opponent can break even by choosing 50% rock and 50% scissors.

So for our strategy we are left with x% rock and (100-x)% paper, with 0<x<50.

Another, similar, argument will show that for our opponent scissors can be no worse than paper. In fact, to minimize his losses, our opponent will play rock 2/3 of the time and scissors 1/3 of the time. Then our expected gain will be \$33.33 for a \$100 winner-pay-loser game. But to prevent our opponent's taking advantage of us, we will have to play rock 1/3 of the time and paper 2/3 of the time.

1. This is almost word for word how I was going to explain it, very clear and concise :)

2. It is true that we should never play scissors but his last paragraph is wrong. Scissors can be worse than paper in exactly the circumstance that we choose to play rock more than a third of the time. In that situation he would have to play paper whenever he is able in order to minimize losses. Paper can also be worse than scissors: when we play rock less than a third of the time, he would have to play scissors whenever he is able.

Finally, for completeness, when we play rock exactly a third of the time it doesn't matter what his strategy is, his losses (our gains) won't change.

16. Game theory neophyte here. What about a strategy that disregards that your opponent has to throw rock 50% of the time. Throwing scissors at any point may be unwise but in any series of games, your opponent's forced strategy of 50% rock throws could, from the beginning, reduce your logical choices to only paper or rock/win or draw. This sounds like a winning strategy, but your opponent will catch on and can now, at least half the time, throw R, P or S while he sees you only throwing R or P. A strategy of throwing R even when not forced to can easily throw a monkey wrench into a reduced playable throw collection of R or P. So ( this is all just from a probably flawed thought process and using no math/GT at all so I expect it has major cracks) never throwing S may have a worse ratio of a win than is usual, never throwing S opens you up to an easily exploited pattern. All that mental gymnastics seems for naught and maybe a strategy of no strategy would, in the end, be the answer. Just because there is a chance your opponent HAS to throw R, the time he has to do so is still unknown to you. Maybe just using your own intuition of when to throw P (or R or S) by using cues/tells that one usually observes their opponents exhibiting such as when playing poker can you actually gain advantage....because if you don't, it is all the same game in the end, except he has the advantage of using all 3 throws 50% the time and 2/3 throws the other 50%, while you would be using 2/3 throws 100%.

Do I have any points of consideration in all that crap I just thought about or am I screwed up?

1. In the last sentence I meant "all 3 throws 50% of the time and 1/3 throws the other 50%" bleh.

2. I think you are a bit off here, but parts of your thinking is on the right track.

Never throwing Scissors doesn't open us up to exploitation by our opponent. Our opponent is already throwing rock at least half ot he time so he is doing more than enough to counter scissors already. It does give him some freedom to play paper, but that isn't really a problem for us because we mostly want to play paper, and paper ties paper and beats rock (which he has to play).

There is a pretty simple mathematical way to see that you should never play scissors but it does rely on a few game theory concepts.

The basic rule is that if you are randomizing between 2 our more strategies than they must all win the same on average for your solution to be optimal.

Scissors, cannot possibly do better than break even, because our opponent is playing rock at least 50% of the time, so even if we win the other 50% we are breaking even.

Therefor we cannot play scissors any percentage of the time in a winning strategy, as otherwise the average winnings of all our options couldn't be bigger than 0.

I'll explain this with more clarity in a video on Monday, subscribe here if you are interested :)

17. Slashdot reader here, and apologies if this is posted multiple times.

1. We know the opponent is forced to select rock 50% of the time.
2. I can exploit this by biasing my selection to paper.
3. The opponent knows this, and so he will bias his second choice to scissor.
4. In turn, I can compensate for this by biasing my second choice to rock.
5. At this point my opponent is stuck.

I have no idea how to model this into a math expression I can find the derivative of, but a quick script suggests:
- My optimal, random choices are paper 2/3 times, and rock 1/3 times.
- The opponent's best strategy will still yield me a win ~13% of the time.

1. And by the way, this was an enjoyable puzzle; I didn't think the win margin would be so small. Thank you.

2. You're intuition here is excellent and you are correct on your optimal strategy. You are slightly off on the % of time you should win if your opponent responds optimally. You should win an average of \$16.66.

I'll go through how to solve this mathematically with algebra in a quick video on Monday, so subscribe here if you're interested https://www.youtube.com/channel/UCNQYC66jbTtxQCgbOzq_Khg.

18. The basic solution for the unconstrained player is rather simple. Given no prior knowledge of what the other player is actually doing with their 50% free choice plays, the safest strategy is to play 33% rock, 67% paper for an EV of +16% to +67% (depending on opponent strategy). With this strategy, the worse-case scenario occurs when the opponent plays 50% rock and 50% scissors (Win:50% Draw:17% Loose:33%), and the best-case scenario occurs when the opponent plays 100% rock (Win:67% Draw:33% Loose:0%)

Of course, the solution is not truly complete unless we take into account what the opponent is actually doing:

If the opponent player's strategy calls for more rock or more paper than the worse-case scenario above (50% rock, 50% scissors), you should increase your weight of playing paper, all the way up to 100%. This will increase your expected returns, and you are safe from ever getting a negative EV should your opponent decide to change their strategy.

For a simplistic game like rock, paper, scissors, inferred past strategies may have much less to do with future strategy compared with a more complex game like poker. You may choose to take the safe 16%-67% EV over a chance at a higher EV, even if you have inferred that your opponent is using an inferior strategy of splitting their free choices between paper and scissors.

The complete strategy for the constrained player is even simpler: don't play.

1. This is almost 100% correct, but your opponent does have a strategy where if you are playing 33% rock and 67% paper, you cannot increase your EV by shifting your weighting between rock and paper. To do this your opponent actually needs to play both paper and scissors a non-zero %.

When you are playing 33% rock and 67% paper and your opponent is playing that optimal strategy your two strategies form a nash equilibrium: http://en.wikipedia.org/wiki/Nash_equilibrium

19. I hacked a solution and a simulation of the solution in Python (run with Pypy for speed!) - what do you think?

https://gist.github.com/anonymous/10004339

1. Cleaned up the code, sprinkled some comments, and added it to a GitHub repos ( https://github.com/ttsiodras/RockPaperScissors-SlashdotPuzzle ).

Thank you for posting this, I thoroughly enjoyed solving it! :-)

2. Very cool!

The problem of programmatically finding nash equilibrium is actually a very interesting one and is something I spend a lot of my time on.

One of the fastest and most powerful algorithms is http://en.wikipedia.org/wiki/Fictitious_play which is definitely worth reading about if you are curious.

There are also techniques called CFRM (counter factual regret minimization) and linear programming (generally not useful in practical settings in big games.

Glad you enjoyed the problem and thanks for posting your code, that's great!

20. I'd like to post a comment about the optimal price to pay, because it is a whole 'nother game theory question (or two or three).

Consider: if the value of playing for \$100 is \$50, should you pay fifty? No, that would zero your winnings. So how many others are competing to play? if none, then you should bid a penny. If there others, then the question becomes more along the lines of, how long does a game take, and what is your time worth? What is your job satisfaction worth? Would you be willing to replace your career with RPS for the next twenty years for a ten percent raise?

1. Good point, perhaps there is a better phrasing for that question. The way I stated it is somewhat standard in game theory but in every day life it raises all the concerns you mentioned.

21. Let r, p and s be the probabilities we play rock, paper and scissors on a given game respectively. It follows that,

r + p + s = 1
where r, p and s are elements of the interval [0,1]

For our opponent we will use a similar notation, only with uppercase letters,

R + P + S = 1
where R is an element of [1/2, 0] and p and s are elements of [1/2,1]

Rearranging, we find that,

s = 1 - r - p
S = 1 - R - P

Now the expected gains, G, we get from any one game is given by,

G = AW + 0D + (-A)L = A(W - L)
where A is the amount exchanged between players and W, D and L is the probability of winning, drawing and losing respectively.

The probability of winning on single game, W, is given by the sum of the probabilities of each of the winning configurations occurring is

W = rS + pR + sP

Similarly the probability of losing on a game is

L = Rs + Pr + Sp

and so the expected gains is given by

G = A( r - R + P - p + 3(pR - Pr) )

Now note that our opponent must play rock at least half the time, so whenever we play scissors, at minimum we will lose half of the time and so it will never improve our gains. Therefore we should never play scissors. In the worst case scenario, our opponent is aware that we will never play scissors and that playing rock more than he has to will never improve his gains. This means

R = 1/2
G = A( r + (p-1)/2 + P(1-3r) )

Assuming the opponent is aware of this, he will do what he can to minimize G. This would mean that if (1-3r) is negative then he would maximize P (the only variable he controls). Similarly if (1-3r) is positive, he will minimize P. Finally if (1-3r) is 0, he cannot affect G.

In all of these scenarios, if we set r = 1/3 (or at least as close as possible to it) we will maximize G. This means that we should play rock a third of the time and paper the rest of the time. Our expected gains per game is G = A/6 so if A = \$100 we should only be willing to pay \$16.66 or less if we wish to make any gains per game.

1. Small typo in forth paragraph: "where R is an element of [1/2, 1] and p and s are elements of [0,1/2]"

22. I've got the first part as far as strategy. You should play paper 2/3 of the time and rock the rest.

I started by ruling out cases. Since you cannot profit on S, you would never throw S. Hence, your opponent can never profit on R and will only throw it when forced to. Then it was simply a matter of finding the ratio I would throw that made opponent's choice irrelevant.

If my math is right, you should be willing to pay up to \$16.66.

23. Initially, for the first few rounds I would play rock most of the time, to achieve a draw or a win over scissors which is the likely intuitive reaction to being forced to play rock. The opponent being of such intellegience, would recognise the pattern and choose paper as a strategy, at which time I would begin by using scissors. I don't know the math, but my instinct would be to go against the intuition of the opponent as much as I can to optimise my wins. At a certain point, this would begin to fail and I suspect I would start to lose two-three times in a row, at which point I would quit. Sorry, I don't have a genius IQ to offer much more.

24. Interesting problems. I suggest the following similar ones:
1. With 50% probability your opponent cannot throw rock. Does this give you an advantage? (No.)
2. In two rounds game your opponent is not allowed to throw rock in at least one of the rounds. Does this give you an advantage? (Yes) How much?

Compare 1 and 2 and be amused.

1. That is entertaining, I like that a lot, thanks for sharing!

25. Spoiler.

The most you should be willing to pay is 100/6 = \$16.66.

I think that the equilibrium strategy is for the unconstrained player to play paper 2/3 of the time and rock 1/3 of the time. The constrained player plays scissors 2/3 and paper 1/3 of the time he gets to choose.

1. I can see that several other people have come up with this answer already but I will post my solution as well.

The Expected Value (EV) of regular RPS is 0 for both players, using the strategy of 1/3 for each possibility. The constrained player is unable to play this strategy so EV(us) > 0.

Because the constrained player must play rock at least 1/2 of the time, if we play scissors with P > 0, then we will lose at least 50% of the time. If the game is to have positive EV, then we should be able to do better.

In any Nash equilibrium, neither player must be able to better by changing their strategies. After ruling out every pure strategy (which I will leave up to the reader), we can see that mixed strategies are required. In any continuous, mixed strategy equilibrium, players will be indifferent between the strategies that play that have non-zero probability (otherwise they could do better by adjusting the probabilities).

We want to find probabilities that will make our opponent indifferent to playing paper or scissors. Therefore:
Payoff (opponent, scissors) = P(us, paper) - P(us, rock)
and Payoff (opponent, paper) = P(us, rock) - P(us, scissors)
are equal. But P(us, scissors) = 0, so this simplifies to:

P(us, paper) - P(us rock) = P(us, rock) or
P(us, paper) = 2 * P(us rock)

Since P(us, paper) + P(us, rock) + P(us, scissors) = 1, we get:
3 * P(us rock) = 1 or,
P(us, rock) = 1/3, and
P(us, paper) = 2/3.

In an equilibrium, we must be indifferent between paper and rock. Using the same logic above, we get P(opponent, scissors) = 2/3 and P(opponent, paper) = 1/3, otherwise it would our payoff would be higher under rock or paper.

From this we can calculate the EV of 100/6 = 16.66.

26. This comment has been removed by the author.

27. Bonus :
The optimal strategy if my opponent is intelligent and can predict my moves as I can predict his would be for me to make a random choice (head or tails) betwwen paper and rock on the first round, and for him to do the same.
I would then gain 25\$ on average which is what I would be willing to pay.
(By "on average" here I mean : a means of all possible results. I know there are only two rounds.)

Explanation :
The only difficulty lies in the strategy for the first round. In the second round, my opponent will play rock if he hasn't play rock before, and a random choice of P/R/S if he has. I will play paper if he hasn't played rock before, and gain 50\$, or play a random selection of P/R/S and gain 0\$ on average if he has played rock before.

There are only 9 possible games for the first round :
If I play R and he plays R, I will gain 0\$ on average for the two rounds.
If I play R and he plays P, I will gain 0\$ on average for the two rounds.
If I play R and he plays S, I will gain 1000\$ for the two rounds.

If I play P and he plays R, I will gain 50\$ on average for the two rounds.
If I play P and he plays P, I will gain 50\$ for the two rounds.
If I play P and he plays S, I will gain 0\$ for the two rounds.

If I play S and he plays R, I will lose 50\$ on average for the two rounds.
If I play S and he plays P, I will gain 100\$ on average for the two rounds.
If I play S and he plays S, I will gain 50\$ on average for the two rounds.

So if my opponent plays randomly in the first round, all options will give me the same gain on average.
But if I play randomly, the situation is very different for him, as he can minimize my gain to an average of zero by playing rock.
But of course I could predict that and play paper in the first round and gain 50\$ if he has played rock.
Which he could then predict, and play scissors to beat me, which I could predict and play rock, which he could predict, which I could predict...
If I can predict his move I can counter him, and if he can predict mine he can counter me.
So my best choice is to be unrpedictable and that goes for him too, while I still keep in mind that playing rock first is the best strategy for him if I make a random choice between the three options.
So my best choice is to make a random choice between paper and rock, thus optimizing my results against rock while remaining upredictable.
If he can predict that, it will be best for him to avoid playing scissors, because against my rock or paper his scissors will gain me 50\$ on average, instead of 25\$ for his rock or paper against mine.

28. Follow-up to my earlier comment :
I got it wrong. I can still improve my strategy and pay as much as 33.33\$to play if I throw a die, and play paper if I get 1,2,3,4, and rock if I get 5 or 6. So a 2/3 chance to play paper, 1/3 chance which of course just matches the result of the first problem.
In that case, there is no optimal strategy for my opponent, he can play whatever he wants.
If I increase the odds that I play paper to more than 50%, thent it becomes interesting for him to play scissors which will give me a 0\$ gain against my paper, but it is balanced by the risk he might lose 100\$ if I still play rock. A 2/3 chance of paper vs 1/3 chance of rock is the equilibrium. Whatever he chooses to play then will make me gain 33.33\$ on average, whether he plays scissors or not.

1. Repost of my first comment which failed to appear :

Bonus :
The optimal strategy if my opponent is intelligent and can predict my moves as I can predict his would be for me to make a random choice (head or tails) betwwen paper and rock on the first round, and for him to do the same.
I would then gain 25\$ on average which is what I would be willing to pay.
(By "on average" here I mean : a means of all possible results. I know there are only two rounds.)

Explanation :
The only difficulty lies in the strategy for the first round. In the second round, my opponent will play rock if he hasn't play rock before, and a random choice of P/R/S if he has. I will play paper if he hasn't played rock before, and gain 50\$, or play a random selection of P/R/S and gain 0\$ on average if he has played rock before.

There are only 9 possible games for the first round :
If I play R and he plays R, I will gain 0\$ on average for the two rounds.
If I play R and he plays P, I will gain 0\$ on average for the two rounds.
If I play R and he plays S, I will gain 1000\$ for the two rounds.

If I play P and he plays R, I will gain 50\$ on average for the two rounds.
If I play P and he plays P, I will gain 50\$ for the two rounds.
If I play P and he plays S, I will gain 0\$ for the two rounds.

If I play S and he plays R, I will lose 50\$ on average for the two rounds.
If I play S and he plays P, I will gain 100\$ on average for the two rounds.
If I play S and he plays S, I will gain 50\$ on average for the two rounds.

So if my opponent plays randomly in the first round, all options will give me the same gain on average.
But if I play randomly, the situation is very different for him, as he can minimize my gain to an average of zero by playing rock.
But of course I could predict that and play paper in the first round and gain 50\$ if he has played rock.
Which he could then predict, and play scissors to beat me, which I could predict and play rock, which he could predict, which I could predict...
If I can predict his move I can counter him, and if he can predict mine he can counter me.
So my best choice is to be unrpedictable and that goes for him too, while I still keep in mind that playing rock first is the best strategy for him if I make a random choice between the three options.
So my best choice is to make a random choice between paper and rock, thus optimizing my results against rock while remaining upredictable.
If he can predict that, it will be best for him to avoid playing scissors, because against my rock or paper his scissors will gain me 50\$ on average, instead of 25\$ for his rock or paper against mine.

29. I don't buy any of this. Please explain where I'm wrong.

Over time my opponent will know my strategy and I'll know his.

My opponent would just match the scissors % to my paper %, insofar as he could.

My paper % should always be >= 50% as that is where my advantage lies.

Therefore my opponent can best counter that advantage by playing scissor 1/2 the time.

My only gain then is when I play rock to counteract the scissor.

That gives me
1/4 P v R = +1/4
1/4 P v S = -1/4
1/4 R v S = +1/4
1/4 R v R = 0

That means I win 1/4 of the time.

If I play 2/3 paper, I only win 1/6 of the time vs 1/2 R & 1/2 S

1. I'm not 100% sure I understand the strategy you are proposing, but if you are saying we should play 50% rock, 50% paper, our opponent could always play paper when he is allowed to and effectively be playing 50% rock and 50% paper as well. We would then break even against him.

2. then our strategy should be to play rock as often as our opponent plays scissors and paper the rest of the time:
then
we win = r * (1 -s ) + s * s
tie = p * (1 - s) + r * s
we lose = s * (1 - s)

because r >= 1/2 and s <= 1/2 my opponents strategy of playing scissors as often as our paper as much as possible would push him to play 1/2 scissors to counteract our strategy and losing 1/4 of the time is the best he can do.

3. You are right that the nash equilibrium strategy has him playing scissors as often as we play rock, but he can lose less than 1 quarter. If you tell me the exact strategy that you are proposing we play, I can tell you the counter strategy for him that loses less than 1/4th of the time.