Hello again, faithful reader(s). Happily my solution to last week’s Riddler turned out to be correct once again, although I didn’t get to write much interesting code for it. This week’s Riddler on the other hand is right up my alley, featuring a puzzle that was ripe for a code-based solution.
Two players go on a hot new game show called “Higher Number Wins.” The two go into separate booths, and each presses a button, and a random number between zero and one appears on a screen. (At this point, neither knows the other’s number, but they do know the numbers are chosen from a standard uniform distribution.) They can choose to keep that first number, or to press the button again to discard the first number and get a second random number, which they must keep. Then, they come out of their booths and see the final number for each player on the wall. The lavish grand prize — a case full of gold bullion — is awarded to the player who kept the higher number. Which number is the optimal cutoff for players to discard their first number and choose another? Put another way, within which range should they choose to keep the first number, and within which range should they reject it and try their luck with a second number?
So, right away a naive solution would be to suggest that contestants should reject any number below 0.5 and accept anything equal to or above 0.5. I wrote a quick Monte Carlo (natch) to test various thresholds for two contestants. Here’s that code:
And here’s some output for various thresholds:
MC: With A 0.50000000 and B 0.75000000, A wins 51.55000000%. MC: With A 0.50000000 and B 0.25000000, A wins 54.83000000%. MC: With A 0.50000000 and B 0.60000000, A wins 49.77000000%. MC: With A 0.50000000 and B 0.40000000, A wins 51.73000000%.
Okay, so we can see that 0.5 is not the best number because it loses more often than not against 0.6 for example. So it’s obvious that we’re going to need some better analysis to figure this one out.
This problem is an example of trying to find a Nash Equilibrium, a set of values for each contestant such that any deviation from those values by one contestant results in an advantage for the other contestant. So my strategy for finding this equilibrium is to start with some threshold, say 0.5, and make small adjustments to it. If the new value does better against the previous value, keep going further. When I find a value that seems to represent a local maxima, begin again with smaller adjustments. Continue this process until the size of adjustments falls below some threshold.
Here’s some code do do that:
And here’s some output from that:
MC: With A 0.60000000 and B 0.50000000, A wins 50.49901996%. MC: With A 0.70000000 and B 0.60000000, A wins 49.63398536%. MC: With A 0.50000000 and B 0.60000000, A wins 49.35948078%. MC: With A 0.61000000 and B 0.60000000, A wins 50.09150641%. MC: With A 0.62000000 and B 0.61000000, A wins 50.30101204%. ... [lots snipped] MC: With A 0.64089890 and B 0.64089900, A wins 49.82999320%. MC: With A 0.64089901 and B 0.64089900, A wins 49.96599932%. MC: With A 0.64089899 and B 0.64089900, A wins 49.88349417%. MC: With A 0.64089900 and B 0.64089900, A wins 49.96049881%. MC: With A 0.64089900 and B 0.64089900, A wins 49.89749282%. Nash rejection rate = 0.64089900
Well, that seems… reasonable. But is it correct? Running this code over and over shows some pretty high variance. In order to reduce that variance, it’s necessary to increase the number of trials to 100,000 or even 1,000,000, but then it’s slow. And still not necessarily correct.
Here are some other rejection thresholds that this same code spits out:
That’s a pretty wide variance there, and it makes sense. The algorithm depends on lots of decisions to be made based on Monte Carlo simulations, all of which will have a good deal of variance. If any of those decisions are wrong, the end result is doomed. Unless we are willing to run literally billions of trials, we are not going to be able to get a precise answer beyond a few decimal places.
So we really need a better way to do this. Is it possible to use, ugh, math to find the answer? Of course it is. So this game show boils down to one of four possibilities that might happen. These are: 1) both contestant A’s and B’s first numbers are below their thresholds, 2) A’s first number is good, but B’s is rejected, 3) A rejects and B accepts the first number, and 4) both A and B accept their first number.
The odds of each of these scenarios is: 1) a * b, 2) (1-a) * b, 3) a * (1-b), and 4) (1-a) * (1-b).
Okay, so now we know the odds of each scenario, now we must multiply those likelihoods by the likelihood that a will win in each scenario. This boils down to… you know what? Let me just show you the code:
Okay, so the first scenario is easy. Both contestants reject their first number and accept whatever their second number is, so 50% that either will win.
In the second scenario we know that A’s score is at least as high as a. So there’s an a chance that B’s second number will be below that number. If B’s second number exceeds a, then it will be uniformly spread between a and 1, so can be represented by (1-a) * 0.5.
The third scenario is the opposite of scenario 2. We know that B’s number is higher than b, so A’s number needs to exceed it. There is a (1-b) chance that it will do that, and then there’s a 50% chance that it will exceed B’s number.
The fourth scenario is one in which both contestants were happy with their first numbers. What are the chances that A beats B here? So, in order to answer that we need another function called AOverB, which takes two ranges and calculates the odds of the first value being higher than the second. But we are going to pass 1-b for A, and 1-a for B. Why? Okay, bear with me here. Let’s say that A’s threshold is 0.6 and B’s threshold is 0.5. Now we know that A’s number will be between 0.6 and 1.0 and B’s will be between 0.5 and 1.0, and that they will be evenly distributed. So B’s range is 0.5 and A’s is only 0.4, but we need to reverse those because A’s range starts at 0.6.
I hope that makes sense. Here’s the code for AOverB:
Okay, so now we have all our scenario odds and all our chances of winning each scenario. The final answer is obtained by adding up the products of each scenario’s occurrence rate with its win rate.
Now, at long last, we can rewrite our Nash function to use AWinPct to find the exact winning percentage for any two rejection thresholds and almost immediately:
And here’s the output:
Math: With A 0.60000000 and B 0.50000000, A wins 50.50000000 Math: With A 0.70000000 and B 0.60000000, A wins 49.40000000 Math: With A 0.50000000 and B 0.60000000, A wins 49.50000000 Math: With A 0.61000000 and B 0.60000000, A wins 50.01200000 Math: With A 0.62000000 and B 0.61000000, A wins 50.00090000 Math: With A 0.63000000 and B 0.62000000, A wins 49.98970000 Math: With A 0.61000000 and B 0.62000000, A wins 49.99910000 Math: With A 0.62100000 and B 0.62000000, A wins 49.99969900 Math: With A 0.61900000 and B 0.62000000, A wins 50.00018900 Math: With A 0.62000000 and B 0.61900000, A wins 49.99981100 Math: With A 0.61800000 and B 0.61900000, A wins 50.00007710 Math: With A 0.61900000 and B 0.61800000, A wins 49.99992290 Math: With A 0.61700000 and B 0.61800000, A wins 49.99996530 Math: With A 0.61810000 and B 0.61800000, A wins 49.99999957 Math: With A 0.61790000 and B 0.61800000, A wins 49.99999931 Math: With A 0.61801000 and B 0.61800000, A wins 50.00000003 Math: With A 0.61802000 and B 0.61801000, A wins 50.00000002 Math: With A 0.61803000 and B 0.61802000, A wins 50.00000001 Math: With A 0.61804000 and B 0.61803000, A wins 50.00000000 Math: With A 0.61802000 and B 0.61803000, A wins 49.99999999 Math: With A 0.61803100 and B 0.61803000, A wins 50.00000000 Math: With A 0.61803200 and B 0.61803100, A wins 50.00000000 Math: With A 0.61803300 and B 0.61803200, A wins 50.00000000 Math: With A 0.61803400 and B 0.61803300, A wins 50.00000000 Math: With A 0.61803500 and B 0.61803400, A wins 50.00000000 Math: With A 0.61803300 and B 0.61803400, A wins 50.00000000 Math: With A 0.61803410 and B 0.61803400, A wins 50.00000000 Math: With A 0.61803390 and B 0.61803400, A wins 50.00000000 Math: With A 0.61803401 and B 0.61803400, A wins 50.00000000 Math: With A 0.61803399 and B 0.61803400, A wins 50.00000000 Math: With A 0.61803400 and B 0.61803399, A wins 50.00000000 Math: With A 0.61803398 and B 0.61803399, A wins 50.00000000 Math: With A 0.61803399 and B 0.61803399, A wins 50.00000000 Math: With A 0.61803399 and B 0.61803399, A wins 50.00000000 Nash rejection rate = 0.61803399 Comparing Nash rejection rate vs. 50% via math and Monte Carlo: Math: With A 0.61803399 and B 0.50000000, A wins 50.43052317 MC: With A 0.61803399 and B 0.50000000, A wins 50.43086405%.
Finally, after finding the Nash rejection rate, I called AWinPct with it and 0.5 and compared that number to calling MCGameShow with it and 0.5, and one million trials. The results were very close, so I felt pretty good about the AWinPct function, but not 100% sure; there were all sorts of ways I could have done something wrong. But then as I was looking at it I had a terrible thought: what is the Golden Ratio? And the answer is… 1.61803399. So my proposed answer is the fractional part of the Golden Ratio? Seems legit.
Here’s the link to the source code.
Thanks for reading!