Comments on GTORangeBuilder Blog: GTO Brain Teaser #1: Exploitation and Counter-Exploitation in Rock Paper Scissors

You are right that the nash equilibrium strategy h...

2014-04-14T12:15:02.291-07:00

You are right that the nash equilibrium strategy has him playing scissors as often as we play rock, but he can lose less than 1 quarter. If you tell me the exact strategy that you are proposing we play, I can tell you the counter strategy for him that loses less than 1/4th of the time.

then our strategy should be to play rock as often ...

2014-04-14T12:09:12.266-07:00

then our strategy should be to play rock as often as our opponent plays scissors and paper the rest of the time:
then
we win = r * (1 -s ) + s * s
tie = p * (1 - s) + r * s
we lose = s * (1 - s)

because r >= 1/2 and s <= 1/2 my opponents strategy of playing scissors as often as our paper as much as possible would push him to play 1/2 scissors to counteract our strategy and losing 1/4 of the time is the best he can do.

I'm not 100% sure I understand the strategy yo...

2014-04-14T10:55:45.251-07:00

I'm not 100% sure I understand the strategy you are proposing, but if you are saying we should play 50% rock, 50% paper, our opponent could always play paper when he is allowed to and effectively be playing 50% rock and 50% paper as well. We would then break even against him.

I don't buy any of this. Please explain where ...

2014-04-14T10:29:25.633-07:00

I don't buy any of this. Please explain where I'm wrong.

Over time my opponent will know my strategy and I'll know his.

My opponent would just match the scissors % to my paper %, insofar as he could.

My paper % should always be >= 50% as that is where my advantage lies.

Therefore my opponent can best counter that advantage by playing scissor 1/2 the time.

My only gain then is when I play rock to counteract the scissor.

That gives me
1/4 P v R = +1/4
1/4 P v S = -1/4
1/4 R v S = +1/4
1/4 R v R = 0

That means I win 1/4 of the time.

If I play 2/3 paper, I only win 1/6 of the time vs 1/2 R & 1/2 S

Repost of my first comment which failed to appear ...

2014-04-07T05:39:59.594-07:00

Repost of my first comment which failed to appear :

Bonus :
The optimal strategy if my opponent is intelligent and can predict my moves as I can predict his would be for me to make a random choice (head or tails) betwwen paper and rock on the first round, and for him to do the same.
I would then gain 25$ on average which is what I would be willing to pay.
(By "on average" here I mean : a means of all possible results. I know there are only two rounds.)

Explanation :
The only difficulty lies in the strategy for the first round. In the second round, my opponent will play rock if he hasn't play rock before, and a random choice of P/R/S if he has. I will play paper if he hasn't played rock before, and gain 50$, or play a random selection of P/R/S and gain 0$ on average if he has played rock before.

There are only 9 possible games for the first round :
If I play R and he plays R, I will gain 0$ on average for the two rounds.
If I play R and he plays P, I will gain 0$ on average for the two rounds.
If I play R and he plays S, I will gain 1000$ for the two rounds.

If I play P and he plays R, I will gain 50$ on average for the two rounds.
If I play P and he plays P, I will gain 50$ for the two rounds.
If I play P and he plays S, I will gain 0$ for the two rounds.

If I play S and he plays R, I will lose 50$ on average for the two rounds.
If I play S and he plays P, I will gain 100$ on average for the two rounds.
If I play S and he plays S, I will gain 50$ on average for the two rounds.

So if my opponent plays randomly in the first round, all options will give me the same gain on average.
But if I play randomly, the situation is very different for him, as he can minimize my gain to an average of zero by playing rock.
But of course I could predict that and play paper in the first round and gain 50$ if he has played rock.
Which he could then predict, and play scissors to beat me, which I could predict and play rock, which he could predict, which I could predict...
If I can predict his move I can counter him, and if he can predict mine he can counter me.
So my best choice is to be unrpedictable and that goes for him too, while I still keep in mind that playing rock first is the best strategy for him if I make a random choice between the three options.
So my best choice is to make a random choice between paper and rock, thus optimizing my results against rock while remaining upredictable.
If he can predict that, it will be best for him to avoid playing scissors, because against my rock or paper his scissors will gain me 50$ on average, instead of 25$ for his rock or paper against mine.

Follow-up to my earlier comment : I got it wrong....

2014-04-07T05:34:08.161-07:00

Follow-up to my earlier comment :
I got it wrong. I can still improve my strategy and pay as much as 33.33$to play if I throw a die, and play paper if I get 1,2,3,4, and rock if I get 5 or 6. So a 2/3 chance to play paper, 1/3 chance which of course just matches the result of the first problem.
In that case, there is no optimal strategy for my opponent, he can play whatever he wants.
If I increase the odds that I play paper to more than 50%, thent it becomes interesting for him to play scissors which will give me a 0$ gain against my paper, but it is balanced by the risk he might lose 100$ if I still play rock. A 2/3 chance of paper vs 1/3 chance of rock is the equilibrium. Whatever he chooses to play then will make me gain 33.33$ on average, whether he plays scissors or not.

I can see that several other people have come up w...

2014-04-07T04:15:15.581-07:00

I can see that several other people have come up with this answer already but I will post my solution as well.

The Expected Value (EV) of regular RPS is 0 for both players, using the strategy of 1/3 for each possibility. The constrained player is unable to play this strategy so EV(us) > 0.

Because the constrained player must play rock at least 1/2 of the time, if we play scissors with P > 0, then we will lose at least 50% of the time. If the game is to have positive EV, then we should be able to do better.

In any Nash equilibrium, neither player must be able to better by changing their strategies. After ruling out every pure strategy (which I will leave up to the reader), we can see that mixed strategies are required. In any continuous, mixed strategy equilibrium, players will be indifferent between the strategies that play that have non-zero probability (otherwise they could do better by adjusting the probabilities).

We want to find probabilities that will make our opponent indifferent to playing paper or scissors. Therefore:
Payoff (opponent, scissors) = P(us, paper) - P(us, rock)
and Payoff (opponent, paper) = P(us, rock) - P(us, scissors)
are equal. But P(us, scissors) = 0, so this simplifies to:

P(us, paper) - P(us rock) = P(us, rock) or
P(us, paper) = 2 * P(us rock)

Since P(us, paper) + P(us, rock) + P(us, scissors) = 1, we get:
3 * P(us rock) = 1 or,
P(us, rock) = 1/3, and
P(us, paper) = 2/3.

In an equilibrium, we must be indifferent between paper and rock. Using the same logic above, we get P(opponent, scissors) = 2/3 and P(opponent, paper) = 1/3, otherwise it would our payoff would be higher under rock or paper.

From this we can calculate the EV of 100/6 = 16.66.

Bonus : The optimal strategy if my opponent is int...

2014-04-07T03:28:31.459-07:00

Bonus :
The optimal strategy if my opponent is intelligent and can predict my moves as I can predict his would be for me to make a random choice (head or tails) betwwen paper and rock on the first round, and for him to do the same.
I would then gain 25$ on average which is what I would be willing to pay.
(By "on average" here I mean : a means of all possible results. I know there are only two rounds.)

Explanation :
The only difficulty lies in the strategy for the first round. In the second round, my opponent will play rock if he hasn't play rock before, and a random choice of P/R/S if he has. I will play paper if he hasn't played rock before, and gain 50$, or play a random selection of P/R/S and gain 0$ on average if he has played rock before.

There are only 9 possible games for the first round :
If I play R and he plays R, I will gain 0$ on average for the two rounds.
If I play R and he plays P, I will gain 0$ on average for the two rounds.
If I play R and he plays S, I will gain 1000$ for the two rounds.

If I play P and he plays R, I will gain 50$ on average for the two rounds.
If I play P and he plays P, I will gain 50$ for the two rounds.
If I play P and he plays S, I will gain 0$ for the two rounds.

If I play S and he plays R, I will lose 50$ on average for the two rounds.
If I play S and he plays P, I will gain 100$ on average for the two rounds.
If I play S and he plays S, I will gain 50$ on average for the two rounds.

So if my opponent plays randomly in the first round, all options will give me the same gain on average.
But if I play randomly, the situation is very different for him, as he can minimize my gain to an average of zero by playing rock.
But of course I could predict that and play paper in the first round and gain 50$ if he has played rock.
Which he could then predict, and play scissors to beat me, which I could predict and play rock, which he could predict, which I could predict...
If I can predict his move I can counter him, and if he can predict mine he can counter me.
So my best choice is to be unrpedictable and that goes for him too, while I still keep in mind that playing rock first is the best strategy for him if I make a random choice between the three options.
So my best choice is to make a random choice between paper and rock, thus optimizing my results against rock while remaining upredictable.
If he can predict that, it will be best for him to avoid playing scissors, because against my rock or paper his scissors will gain me 50$ on average, instead of 25$ for his rock or paper against mine.

2014-04-06T22:14:39.488-07:00

This comment has been removed by the author.

Spoiler. The most you should be willing to pay is...

2014-04-06T22:07:35.254-07:00

Spoiler.

The most you should be willing to pay is 100/6 = $16.66.

I think that the equilibrium strategy is for the unconstrained player to play paper 2/3 of the time and rock 1/3 of the time. The constrained player plays scissors 2/3 and paper 1/3 of the time he gets to choose.

That is entertaining, I like that a lot, thanks fo...

2014-04-06T18:26:14.110-07:00

That is entertaining, I like that a lot, thanks for sharing!

Your explanation was not wlog. Using your logic ch...

2014-04-06T18:09:59.380-07:00

Your explanation was not wlog. Using your logic choose paper 100%. Then wlog fix R at 100%. You win every game! Wow, Clever!

Lets look at a better strategy against yours. Suppose your opponent chooses Scissors and Rock just as often. Then we have as the expected number of wins the following: .5(2/3 -1/6) + .5(-2/3 + 1/6) = 0 when using your strategy. You break even. If you never play scissors, this works out better.

"the probabilities of the other are 0.5+(s*0....

2014-04-06T17:40:18.368-07:00

"the probabilities of the other are 0.5+(s*0.5), 0.5*(1-r-s), 0.5*r"

This statement is wrong, but it works out since s is 0. It should be:

"the probabilities of the other are max[0.5, s], (1-max[0.5, s])/(1-s)*(1-r-s), (1-max[0.5, s])/(1-s)*r"

Interesting problems. I suggest the following simi...

2014-04-06T15:16:20.075-07:00

Interesting problems. I suggest the following similar ones:
1. With 50% probability your opponent cannot throw rock. Does this give you an advantage? (No.)
2. In two rounds game your opponent is not allowed to throw rock in at least one of the rounds. Does this give you an advantage? (Yes) How much?

Compare 1 and 2 and be amused.

Initially, for the first few rounds I would play r...

2014-04-06T14:10:59.568-07:00

Initially, for the first few rounds I would play rock most of the time, to achieve a draw or a win over scissors which is the likely intuitive reaction to being forced to play rock. The opponent being of such intellegience, would recognise the pattern and choose paper as a strategy, at which time I would begin by using scissors. I don't know the math, but my instinct would be to go against the intuition of the opponent as much as I can to optimise my wins. At a certain point, this would begin to fail and I suspect I would start to lose two-three times in a row, at which point I would quit. Sorry, I don't have a genius IQ to offer much more.

I've got the first part as far as strategy. Yo...

2014-04-06T07:21:59.062-07:00

I've got the first part as far as strategy. You should play paper 2/3 of the time and rock the rest.

I started by ruling out cases. Since you cannot profit on S, you would never throw S. Hence, your opponent can never profit on R and will only throw it when forced to. Then it was simply a matter of finding the ratio I would throw that made opponent's choice irrelevant.

If my math is right, you should be willing to pay up to $16.66.

Small typo in forth paragraph: "where R is an...

2014-04-06T07:10:42.881-07:00

Small typo in forth paragraph: "where R is an element of [1/2, 1] and p and s are elements of [0,1/2]"

Let r, p and s be the probabilities we play rock, ...

2014-04-06T07:07:32.128-07:00

Let r, p and s be the probabilities we play rock, paper and scissors on a given game respectively. It follows that,

r + p + s = 1
where r, p and s are elements of the interval [0,1]

For our opponent we will use a similar notation, only with uppercase letters,

R + P + S = 1
where R is an element of [1/2, 0] and p and s are elements of [1/2,1]

Rearranging, we find that,

s = 1 - r - p
S = 1 - R - P

Now the expected gains, G, we get from any one game is given by,

G = AW + 0D + (-A)L = A(W - L)
where A is the amount exchanged between players and W, D and L is the probability of winning, drawing and losing respectively.

The probability of winning on single game, W, is given by the sum of the probabilities of each of the winning configurations occurring is

W = rS + pR + sP

Similarly the probability of losing on a game is

L = Rs + Pr + Sp

and so the expected gains is given by

G = A( r - R + P - p + 3(pR - Pr) )

Now note that our opponent must play rock at least half the time, so whenever we play scissors, at minimum we will lose half of the time and so it will never improve our gains. Therefore we should never play scissors. In the worst case scenario, our opponent is aware that we will never play scissors and that playing rock more than he has to will never improve his gains. This means

R = 1/2
G = A( r + (p-1)/2 + P(1-3r) )

Assuming the opponent is aware of this, he will do what he can to minimize G. This would mean that if (1-3r) is negative then he would maximize P (the only variable he controls). Similarly if (1-3r) is positive, he will minimize P. Finally if (1-3r) is 0, he cannot affect G.

In all of these scenarios, if we set r = 1/3 (or at least as close as possible to it) we will maximize G. This means that we should play rock a third of the time and paper the rest of the time. Our expected gains per game is G = A/6 so if A = $100 we should only be willing to pay $16.66 or less if we wish to make any gains per game.

Joel you're incorrect on this one. The problem...

2014-04-06T06:56:20.700-07:00

Joel you're incorrect on this one. The problem clearly states your opponent is a smart thinking player. If you never play scissors, you opponent will always play scissors when he has the choice and you will not profit. Therefore you *must* play scissors some percentage of the time.

This is almost 100% correct, but your opponent doe...

2014-04-06T06:30:46.953-07:00

This is almost 100% correct, but your opponent does have a strategy where if you are playing 33% rock and 67% paper, you cannot increase your EV by shifting your weighting between rock and paper. To do this your opponent actually needs to play both paper and scissors a non-zero %.

When you are playing 33% rock and 67% paper and your opponent is playing that optimal strategy your two strategies form a nash equilibrium: http://en.wikipedia.org/wiki/Nash_equilibrium

Good point, perhaps there is a better phrasing for...

2014-04-06T06:26:18.436-07:00

Good point, perhaps there is a better phrasing for that question. The way I stated it is somewhat standard in game theory but in every day life it raises all the concerns you mentioned.

Very cool! The problem of programmatically findin...

2014-04-06T06:25:21.029-07:00

Very cool!

The problem of programmatically finding nash equilibrium is actually a very interesting one and is something I spend a lot of my time on.

One of the fastest and most powerful algorithms is http://en.wikipedia.org/wiki/Fictitious_play which is definitely worth reading about if you are curious.

There are also techniques called CFRM (counter factual regret minimization) and linear programming (generally not useful in practical settings in big games.

Glad you enjoyed the problem and thanks for posting your code, that's great!

I'd like to post a comment about the optimal p...

2014-04-06T04:34:09.083-07:00

I'd like to post a comment about the optimal price to pay, because it is a whole 'nother game theory question (or two or three).

Consider: if the value of playing for $100 is $50, should you pay fifty? No, that would zero your winnings. So how many others are competing to play? if none, then you should bid a penny. If there others, then the question becomes more along the lines of, how long does a game take, and what is your time worth? What is your job satisfaction worth? Would you be willing to replace your career with RPS for the next twenty years for a ten percent raise?

Cleaned up the code, sprinkled some comments, and ...

2014-04-06T04:24:30.373-07:00

Cleaned up the code, sprinkled some comments, and added it to a GitHub repos ( https://github.com/ttsiodras/RockPaperScissors-SlashdotPuzzle ).

Thank you for posting this, I thoroughly enjoyed solving it! :-)

I hacked a solution and a simulation of the soluti...

2014-04-06T03:43:27.100-07:00

I hacked a solution and a simulation of the solution in Python (run with Pypy for speed!) - what do you think?

https://gist.github.com/anonymous/10004339