The 538 Riddler: Hot New Game Show

Hello again, faithful reader(s). Happily my solution to last week’s Riddler turned out to be correct once again, although I didn’t get to write much interesting code for it. This week’s Riddler on the other hand is right up my alley, featuring a puzzle that was ripe for a code-based solution.

Two players go on a hot new game show called “Higher Number Wins.” The two go into separate booths, and each presses a button, and a random number between zero and one appears on a screen. (At this point, neither knows the other’s number, but they do know the numbers are chosen from a standard uniform distribution.) They can choose to keep that first number, or to press the button again to discard the first number and get a second random number, which they must keep. Then, they come out of their booths and see the final number for each player on the wall. The lavish grand prize — a case full of gold bullion — is awarded to the player who kept the higher number. Which number is the optimal cutoff for players to discard their first number and choose another? Put another way, within which range should they choose to keep the first number, and within which range should they reject it and try their luck with a second number?

So, right away a naive solution would be to suggest that contestants should reject any number below 0.5 and accept anything equal to or above 0.5. I wrote a quick Monte Carlo (natch) to test various thresholds for two contestants. Here’s that code:

#define PRECISION "%.8f"

double GetScore(double thresh)
{
    double score = RandDouble();
    if(score >= thresh)
        return score;
    return RandDouble();
}

int GameShow(double aThresh, double bThresh)
{
    double a = GetScore(aThresh);
    double b = GetScore(bThresh);

    if(a > b)
        return 0;
    if(b > a)
        return 1;

    return -1;
}

double MCGameShow(int trials, double a, double b, bool verbose = false)
{
    int aWins = 0;
    int bWins = 0;

    for(int i = 0; i < trials; i++)
    {
        int res = GameShow(a, b);
        if(res == 0)
            aWins++;
        if(res == 1)
            bWins++;
    }
    double pct = (double)aWins / (double)(aWins + bWins);
    if(verbose)
    {
        printf("MC: With A " PRECISION " and B " PRECISION ", A wins " PRECISION "%%.\n"
            , a, b, pct * 100.0);
    }
    return pct;
}

 

And here’s some output for various thresholds:

MC: With A 0.50000000 and B 0.75000000, A wins 51.55000000%.
MC: With A 0.50000000 and B 0.25000000, A wins 54.83000000%.
MC: With A 0.50000000 and B 0.60000000, A wins 49.77000000%.
MC: With A 0.50000000 and B 0.40000000, A wins 51.73000000%.

Okay, so we can see that 0.5 is not the best number because it loses more often than not against 0.6 for example. So it’s obvious that we’re going to need some better analysis to figure this one out.

This problem is an example of trying to find a Nash Equilibrium, a set of values for each contestant such that any deviation from those values by one contestant results in an advantage for the other contestant. So my strategy for finding this equilibrium is to start with some threshold, say 0.5, and make small adjustments to it. If the new value does better against the previous value, keep going further. When I find a value that seems to represent a local maxima, begin again with smaller adjustments. Continue this process until the size of adjustments falls below some threshold.

Here’s some code do do that:

double FindNashMC(bool verbose = false)
{
    const int trials = 100000;
    double a = 0.5;
    double inc = 0.1;
    while(true)
    {
        double pctInc = MCGameShow(trials, a + inc, a, verbose);

        if(pctInc > 0.5)
        {
            a = a + inc;
        }
        else 
        {
            double pctDec = MCGameShow(trials, a - inc, a, verbose);
            if(pctDec > 0.5)
            {
                a = a - inc;
            }
            else
            {
                inc /= 10.0;
                if(inc < 0.000000001)
                    break;
            }
        }
    }
    if(verbose)
    {
        printf("Nash rejection rate = " PRECISION "\n", a);
    }
    return a;
}

 

And here’s some output from that:

MC: With A 0.60000000 and B 0.50000000, A wins 50.49901996%.
MC: With A 0.70000000 and B 0.60000000, A wins 49.63398536%.
MC: With A 0.50000000 and B 0.60000000, A wins 49.35948078%.
MC: With A 0.61000000 and B 0.60000000, A wins 50.09150641%.
MC: With A 0.62000000 and B 0.61000000, A wins 50.30101204%.
... [lots snipped]
MC: With A 0.64089890 and B 0.64089900, A wins 49.82999320%.
MC: With A 0.64089901 and B 0.64089900, A wins 49.96599932%.
MC: With A 0.64089899 and B 0.64089900, A wins 49.88349417%.
MC: With A 0.64089900 and B 0.64089900, A wins 49.96049881%.
MC: With A 0.64089900 and B 0.64089900, A wins 49.89749282%.
Nash rejection rate = 0.64089900

Well, that seems… reasonable. But is it correct? Running this code over and over shows some pretty high variance. In order to reduce that variance, it’s necessary to increase the number of trials to 100,000 or even 1,000,000, but then it’s slow. And still not necessarily correct.

Here are some other rejection thresholds that this same code spits out:

0.61001981
0.60980422
0.60098909
0.63110990
0.59025124

That’s a pretty wide variance there, and it makes sense. The algorithm depends on lots of decisions to be made based on Monte Carlo simulations, all of which will have a good deal of variance. If any of those decisions are wrong, the end result is doomed. Unless we are willing to run literally billions of trials, we are not going to be able to get a precise answer beyond a few decimal places.

So we really need a better way to do this. Is it possible to use, ugh, math to find the answer? Of course it is. So this game show boils down to one of four possibilities that might happen. These are: 1) both contestant A’s and B’s first numbers are below their thresholds, 2) A’s first number is good, but B’s is rejected, 3) A rejects and B accepts the first number, and 4) both A and B accept their first number.

The odds of each of these scenarios is: 1) a * b, 2) (1-a) * b, 3) a * (1-b), and 4) (1-a) * (1-b).

Okay, so now we know the odds of each scenario, now we must multiply those likelihoods by the likelihood that a will win in each scenario. This boils down to… you know what? Let me just show you the code:

double AWinPct(double a, double b, bool verbose = false)
{
    if(a == b)
        return 0.5;

    double odds1 = a * b; // (a < thresh, b < thresh)
    double odds2 = (1 - a) * b; // a > thresh, b < thresh
    double odds3 = a * (1 - b); // a < thresh, b > thresh
    double odds4 = (1 - a) * (1 - b); // a > thresh, b > thresh

    double aWins1 = 0.5;
    double aWins2 = a + (1 - a) * 0.5;
    double aWins3 = (1 - b) * 0.5;
    double aWins4 = AOverB(1 - b, 1 - a);

    double pct = odds1 * aWins1 + odds2 * aWins2 + odds3 * aWins3 + odds4 * aWins4;
    if(verbose)
    {
        printf("Math: With A " PRECISION " and B " PRECISION ", A wins " PRECISION "\n", a, b, 100.0 * pct);
    }
    return pct;
}

 

Okay, so the first scenario is easy. Both contestants reject their first number and accept whatever their second number is, so 50% that either will win.

In the second scenario we know that A’s score is at least as high as a. So there’s an a chance that B’s second number will be below that number. If B’s second number exceeds a, then it will be uniformly spread between a and 1, so can be represented by (1-a) * 0.5.

The third scenario is the opposite of scenario 2. We know that B’s number is higher than b, so A’s number needs to exceed it. There is a (1-b) chance that it will do that, and then there’s a 50% chance that it will exceed B’s number.

The fourth scenario is one in which both contestants were happy with their first numbers. What are the chances that A beats B here? So, in order to answer that we need another function called AOverB, which takes two ranges and calculates the odds of the first value being higher than the second. But we are going to pass 1-b for A, and 1-a for B. Why? Okay, bear with me here. Let’s say that A’s threshold is 0.6 and B’s threshold is 0.5. Now we know that A’s number will be between 0.6 and 1.0 and B’s will be between 0.5 and 1.0, and that they will be evenly distributed. So B’s range is 0.5 and A’s is only 0.4, but we need to reverse those because A’s range starts at 0.6.

I hope that makes sense. Here’s the code for AOverB:

double AOverB(double a, double b, bool verbose = false)
{
    double dPct = 0.5;
    if(a > b)
    {
        double dOddsOver = (a - b) / a;
        dPct = dOddsOver + (1 - dOddsOver) * 0.5;
    }
    else if(a < b)
    {
        dPct = 1 - AOverB(b, a);
    }
    if(verbose)
    {
        printf("Math " PRECISION " over " PRECISION " = " PRECISION "\n", a, b, dPct);
    }
    return dPct;
}

 

Okay, so now we have all our scenario odds and all our chances of winning each scenario. The final answer is obtained by adding up the products of each scenario’s occurrence rate with its win rate.

Now, at long last, we can rewrite our Nash function to use AWinPct to find the exact winning percentage for any two rejection thresholds and almost immediately:

double FindNash(bool verbose = false)
{
    double a = 0.5;
    double inc = 0.1;
    while(true)
    {
        double pctInc = AWinPct(a + inc, a, verbose);

        if(pctInc > 0.5)
        {
            a = a + inc;
        }
        else 
        {
            double pctDec = AWinPct(a - inc, a, verbose);
            if(pctDec > 0.5)
            {
                a = a - inc;
            }
            else
            {
                inc /= 10.0;
                if(inc < 0.000000001)
                    break;
            }
        }
    }
    if(verbose)
    {
        printf("Nash rejection rate = " PRECISION "\n", a);
    }
    return a;
}

 

And here’s the output:

Math: With A 0.60000000 and B 0.50000000, A wins 50.50000000
Math: With A 0.70000000 and B 0.60000000, A wins 49.40000000
Math: With A 0.50000000 and B 0.60000000, A wins 49.50000000
Math: With A 0.61000000 and B 0.60000000, A wins 50.01200000
Math: With A 0.62000000 and B 0.61000000, A wins 50.00090000
Math: With A 0.63000000 and B 0.62000000, A wins 49.98970000
Math: With A 0.61000000 and B 0.62000000, A wins 49.99910000
Math: With A 0.62100000 and B 0.62000000, A wins 49.99969900
Math: With A 0.61900000 and B 0.62000000, A wins 50.00018900
Math: With A 0.62000000 and B 0.61900000, A wins 49.99981100
Math: With A 0.61800000 and B 0.61900000, A wins 50.00007710
Math: With A 0.61900000 and B 0.61800000, A wins 49.99992290
Math: With A 0.61700000 and B 0.61800000, A wins 49.99996530
Math: With A 0.61810000 and B 0.61800000, A wins 49.99999957
Math: With A 0.61790000 and B 0.61800000, A wins 49.99999931
Math: With A 0.61801000 and B 0.61800000, A wins 50.00000003
Math: With A 0.61802000 and B 0.61801000, A wins 50.00000002
Math: With A 0.61803000 and B 0.61802000, A wins 50.00000001
Math: With A 0.61804000 and B 0.61803000, A wins 50.00000000
Math: With A 0.61802000 and B 0.61803000, A wins 49.99999999
Math: With A 0.61803100 and B 0.61803000, A wins 50.00000000
Math: With A 0.61803200 and B 0.61803100, A wins 50.00000000
Math: With A 0.61803300 and B 0.61803200, A wins 50.00000000
Math: With A 0.61803400 and B 0.61803300, A wins 50.00000000
Math: With A 0.61803500 and B 0.61803400, A wins 50.00000000
Math: With A 0.61803300 and B 0.61803400, A wins 50.00000000
Math: With A 0.61803410 and B 0.61803400, A wins 50.00000000
Math: With A 0.61803390 and B 0.61803400, A wins 50.00000000
Math: With A 0.61803401 and B 0.61803400, A wins 50.00000000
Math: With A 0.61803399 and B 0.61803400, A wins 50.00000000
Math: With A 0.61803400 and B 0.61803399, A wins 50.00000000
Math: With A 0.61803398 and B 0.61803399, A wins 50.00000000
Math: With A 0.61803399 and B 0.61803399, A wins 50.00000000
Math: With A 0.61803399 and B 0.61803399, A wins 50.00000000
Nash rejection rate = 0.61803399

Comparing Nash rejection rate vs. 50% via math and Monte Carlo:
Math: With A 0.61803399 and B 0.50000000, A wins 50.43052317
MC: With A 0.61803399 and B 0.50000000, A wins 50.43086405%.

Finally, after finding the Nash rejection rate, I called AWinPct with it and 0.5 and compared that number to calling MCGameShow with it and 0.5, and one million trials. The results were very close, so I felt pretty good about the AWinPct function, but not 100% sure; there were all sorts of ways I could have done something wrong. But then as I was looking at it I had a terrible thought: what is the Golden Ratio? And the answer is… 1.61803399. So my proposed answer is the fractional part of the Golden Ratio? Seems legit.

Here’s the link to the source code.

Thanks for reading!

 

Author: Wiesman

Husband, father, video game developer, liberal, and perpetual Underdog.

11 thoughts on “The 538 Riddler: Hot New Game Show”

  1. Great to see someone come to the same answer through a different route!
    Here was my thinking:
    The average expected outcome “y” for every cut-off strategy “x” can be calculated as finding the average of the kept first numbers, and 0.5 (the average number of the second number). The average number of the set of kept first numbers can be expressed as (1+x)/2. So, for example of if your cut-off is 0.5, the average of the numbers above that cut-off is 0.75.
    Now, the frequency that you will get a number above your chosen cutoff is 1-x. So if you choose a high cut-off, say 0.9, you will only get to keep your first number 0.1 of the time.
    So, the full equation is: y = ((1+x)/2)*(1-x)+0.5x
    Graphing this equation tells you that the best average y will be obtained with a cut-off strategy of 0.5, yielding an average y of 0.625.
    However, if player A gets a first number of, say, 0.55, he knows that, on average, player B will get 0.625 by playing a 0.5 cut-off strategy, so player A is probably going to lose with 0.55.
    Player A might be tempted to risk everything and go for a second number. He is likely to do worse on the second number, but he’s probably going to lose with his first number anyway, right?
    But how far should he deviate from the optimal strategy? Well, if you look at the graph of the y = ((1+x)/2)*(1-x)+0.5x equation, the average expected return decreases slowly at first as you increase the cutoff from 0.5.
    If you increase your cut-off all the way to 0.625, for example, your expected return falls to 0.6172.
    The ideal strategy is to choose the highest cut-off that does not result in a lower expected result. So, when y=x.
    Solving this equation x = ((1+x)/2)*(1-x)+0.5x yields the answer 0.61803398874989484820458683436564. This is more elegantly expressed as: (√5-1)/2

    1. What do you mean when you say: the highest cutoff that does not result in a lower expected value? It seems to me that any deviation from the maxima (where x=.5) would result in lower expected value? Thanks!

      1. Good question. Yes, 0.5 is the cutoff value that maximizes the average score of the player at 0.625; however, what we are looking for is the value that maximizes the winning percentage over the other player. And that is (phi – 1), or (√5-1)/2, or 0.61803399.

        Interestingly enough (for me as a poker enthusiast), Ollie at 538 found this forum post by poker pro David Sklansky that illustrates the same principle.

        http://forumserver.twoplustwo.com/showpost.php?p=49528117&postcount=21

        Thanks for the question!

      2. The equation to calculate the expected outcome is y = ((1+x)/2)*(1-x)+0.5x
        What I mean by ‘highest cutoff that does not result in a lower expected value’ omits the words ‘than the cutoff’ : it’s the highest cutoff ‘x’ that does not result in an outcome ‘y’ lower than x.
        So, for example if we set our cutoff x= 0.5, then y = .625. As x is increased, y decreases, until they equal each other at 0.618… Any higher cutoff x results in a lower y.

      3. Can you explain why you still use the expected return calculation that includes an assumption for your first number even when you already know your received “.55”

  2. I had a hard time getting my ‘instinct’ to believe the math on this one, so here is a way to think about it. Let’s say you know the basic equation for how to calculate the expected y outcome [y = ((1+x)/2)*(1-x)+0.5x]. You know that the best strategy for getting a higher average outcome is to use a cutoff of 0.5, but you’ve just been given 0.55 as your first number. You know that the ‘best’ strategy says you should accept 0.55, but you know it’s a lousy number, below your expected average of 0.625. So, you imagine what your average outcome would be if your cutoff was, say, 0.56; you’d find your average outcome was still much higher than 0.55. Armed with this knowledge, you think ‘if I had adopted a strategy of 0.56 cutoff, I’d still average better than 0.55, I better take a second number’.
    If you do this at progressively higher cutoff points, you eventually get a, say, 0.619, and you go through the same thought process: ‘if my cutoff was 0.62, what would my expected outcome be?’ and in this case the answer is lower than 0.619- you’re already getting above average for a 0.62 cutoff, so accept the first number! Working back, you see that you should discard any first number where, if that number were used as the cutoff point, your average outcome would be higher.
    A cutoff of 0.618… yields an average of 0.618…, so that is the ideal cutoff.

Disagree?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s