What are the chances that I will write two blog posts only a week apart? Well, last week’s post was a lot of fun (for me) so I decided to do another one. In that post, I wrote some code to answer FiveThirtyEight.com’s riddle about a sadistic car salesman. As I wrote last week, I wasn’t entirely sure my solution was correct, but it turned out that it was, and so I got a shout-out in this week’s Riddler (at the bottom). There were also some comments from people who had independently came to the same answer that I did, and I really enjoyed reading them, so thanks for those.

This week’s Riddler concerns a basketball player with a remarkable neurosis:

A basketball player is in the gym practicing free throws. He makes his first shot, then misses his second. This player tends to get inside his own head a little bit, so this isn’t good news. Specifically, the probability he hits any subsequent shot is equal to the overall percentage of shots that he’s made thus far. (His neuroses are very exacting.) His coach, who knows his psychological tendency and saw the first two shots, leaves the gym and doesn’t see the next 96 shots. The coach returns, and sees the player make shot No. 99. What is the probability, from the coach’s point of view, that he makes shot No. 100?

Okay, two things right off the bat. First, according to the rule specified, any time this player made a single shot, that should determine ALL future shots. He would either make 100% or miss 100%, and it wouldn’t be possible to make the first one and miss the second one. But that would be pretty boring, so let’s ignore that.

Second, after reading the problem my immediate reaction was, “Oh, it’s 2/3, or 66.7%. That seems obvious. It even says, ‘from the coach’s point of view’ so that’s what it is. That’s too easy; it must be wrong.”

After thinking about it some more, I decided that it might be more likely that the answer was something like 50/99, or 50.5%. So I wrote a Monte Carlo program to simulate this neurotic hoopster and ran it. And the answer is… 2/3. Or at least that’s what I think it is. Here’s the code:

// NeuroticShooter.cpp : Defines the entry point // for the console application. #include "stdafx.h"; #include <stdlib.h>; #include <time.h>; enum EShotResult { NS_Miss, NS_Made, NS_Invalid, }; int SimShot(int &shots, int &made) { // take a random number between 0 and shots so far (eg 2) // if less than the number we have currently made (eg 1) // then we made it int shotValue = rand() % shots; shots++; if(shotValue < made) { made++; return NS_Made; } return NS_Miss; } int MakesLastShot(int thirdSeen, int lastShot) { // we start with the assumption that the first two // shots were taken and one was made int shots = 2; int made = 1; // now we simulate the unseen shots while(shots < thirdSeen) { SimShot(shots, made); } #ifdef THE_WRONG_WAY // This code might seem correct but will not correctly // "prune" the tree of possibilities shots++; made++; #else // now ensure that we make the "seen" shot (eg 99) if(SimShot(shots, made) != NS_Made) return NS_Invalid; #endif // simulate any additional unseen shots // (this will be 0 in given problem) while(shots < lastShot) { SimShot(shots, made); } // now simulate final shot and return its success return SimShot(shots, made); } void MC_Trial(int thirdSeen, int lastShot) { int trials = 10000; int made = 0; int validTrials = 0; for(int i = 0; i < trials; i++) { switch(MakesLastShot(thirdSeen, lastShot)) { case NS_Invalid: break; case NS_Made: made++; // fall through! case NS_Miss: validTrials++; } } printf("Coach sees: Make, Miss, [%d unseen], Make, [%d unseen].\n Player then makes last shot %.3f%% of time.\n\n" , thirdSeen - 2, lastShot - thirdSeen - 1, (double)made * 100.0 / (double)validTrials); } int _tmain(int argc, _TCHAR* argv[]) { srand((unsigned int)time(NULL)); // shot indices are 0-based, so "2" = third shot, // "98" = 99th shot, "99" = 100th shot MC_Trial(98, 99); return 0; }

And here’s the output:

Coach sees: Make, Miss, [96 unseen], Make, [0 unseen]. Player then makes last shot 66.379% of time.

So, yeah, 2/3 (give or take some expected Monte Carlo variance). Now if you look at that code you can see a #ifdef compiler directive that makes a BIG difference in what we get as an answer.

#ifdef THE_WRONG_WAY // This code might seem correct but will not correctly "prune" the // tree of possibilities shots++; made++; #else // now ensure that we make the "seen" shot (eg 99) if(SimShot(shots, made) != NS_Made) return NS_Invalid; #endif

If we simulate 98 free throws, then “give credit” for a made shot, then calculate the odds of making that last free throw, then the answer is… 50.5%. But that’s not right. At least, I don’t think it is.

Instead, we need to simulate the 99th free throw and throw out the entire series if it is missed. We only count trials where the 99th free throw is made “naturally.” Why? Well, because we are trying to simulate what will happen by following these rules. And the universe in which the player makes the 99th free throw is different than the universe where he misses it, and our simulation needs to exist in only the universe where he makes it.

Think of it this way. Each shot that the player takes presents a fork in a big tree of nodes representing the multiverse of possibilities. And we need to prune all the universes where that 99th free throw was missed. That’s why our function returns 3 values instead of just two. Some of the simulations are marked as invalid, so we don’t count them in our total number of simulations. And when we only count the simulations where our player “naturally” makes the 99th free throw, he goes on to make the 100th free throw 2/3 of the time.

Is that the correct way to look at this? I think so, but maybe we’ll find out next Friday that I was wrong.

There are some other interesting (for certain nerdy definitions of “interesting”) things about this riddle. First, if my method is the correct one, it doesn’t matter at what point the coach walks back into the gym to observe the third shot. For example, if the coach sees two shots, leaves for 48 shots, comes back to observe a made shot, then leaves for another 48 unseen shots, the chances of the player hitting that 100th shot will still be 2/3.

This can be illustrated by calling the MC_Trial function with different values:

MC_Trial(50, 99); MC_Trial(2, 99); Output: Coach sees: Make, Miss, [48 unseen], Make, [48 unseen]. Player then makes last shot 68.030% of time. Coach sees: Make, Miss, [0 unseen], Make, [96 unseen]. Player then makes last shot 66.192% of time.

(Again, the answer is 66.7% with some Monte Carlo variance.)

Also, because of the nature of this particular player’s neurosis, when you get to the end of the 100 shots, the chances for any particular “node” on the tree, or rather the chances that the player will score any particular score between 1 made shot and 99, are exactly equal. That’s pretty counter-intuitive to think that a player has an equal chance to make 1/100 and 99/100 shots, but once again this is a remarkable player.

I wrote a function to prove this. It’s a decent example of dynamic programming, where the results of the next round of data is built using data from previous rounds. Here’s the code:

void PerformanceOdds() { int shots = 2; double chanceN[101] = {0, 1, 0}; // after 2 shots, 0% chance of 0, 100% chance of 1, 0% chance of 2 for(; shots < 100; shots++) { for(int n = shots; n >= 1; n--) { // the chance of getting to n made shots by missing from prev state double missChance = chanceN[n] * (double)(shots - n) / (double) shots; // the chance of getting to n made shots by making it from prev state double hitChance = chanceN[n - 1] * (double)(n - 1) / (double)shots; chanceN[n] = missChance + hitChance; } } for(int i = 0; i <= 100; i++) { printf("Chance of shooting %d/%d = %.3f%%\n", i, 100, 100.0 * chanceN[i]); } }

And the output:

Chance of shooting 0/100 = 0.000%

Chance of shooting 1/100 = 1.010%

Chance of shooting 2/100 = 1.010%

Chance of shooting 3/100 = 1.010%

Chance of shooting 4/100 = 1.010%

Chance of shooting 5/100 = 1.010%

….

Chance of shooting 95/100 = 1.010%

Chance of shooting 96/100 = 1.010%

Chance of shooting 97/100 = 1.010%

Chance of shooting 98/100 = 1.010%

Chance of shooting 99/100 = 1.010%

Chance of shooting 100/100 = 0.000%

Finally, I tweeted this to @ollie at 538:

What I meant by that is that in some ways, what we have simulated here is a pollster taking a random sample of a population. Before taking the poll, each outcome is equally likely, but once we take that sample, we expect that future responses (and actual opinion) will be consistent with that sample, within an expected margin of error. If our sample is truly random, that is what we can expect, but if not, well, our results might show 50.5% when the answer is really 66.7%. Since polls, and samples, and people’s expectations of (and reaction to) them is of great concern to FiveThirtyEight, I thought maybe this week’ Riddler might be a bit pointed.

Then again, I could be wrong!

NOTE: ~~WordPress really sucks for displaying formatted C++ code. I’m trying to figure out how to get it to display properly but haven’t had much luck. If anyone has any tips, I’d appreciate it.~~ I think I figured it out. Still not thrilled with it, but at least it’s readable now.

Really nice, well done. I was very surprised when I kept coming up with 2/3, was convinced it must be higher, instinctively thought there would be far more universes with the 99th shot made where the odds were much higher, but no matter how I tweaked the numbers, it was 2/3 all the way. Your code is beyond me, but the thought process is sound.

Thanks!

You’re right. I’ve been trying to figure out how to make it less than 2/3, and I’m failing miserably. I’ll be gobsmacked if we’re wrong.

I’ll be surprised if we’re wrong too, but I know that if I keep doing these riddles there will eventually be a time where I am sure I am correct, but wrong, so I’m not making any definitive statements. Still… I’d be surprised about this one.

I think because this one is, dare I say, obvious, it makes you want to second guess yourself. We can only base our answer on the numbers given. The coach has witnessed him make 2 of 3. Everything else is a smokescreen.

Not quite a smokescreen. See my answer below. Lemma 2 would be false if, for example, the probabilities were reversed (i.e., his chance of *missing* was the percentage of *made* shots). (Under that assumption, we might still say that we can’t give an answer any better than 2/3 (or 1/3) but it differs from this case, in which (under the Bayesian assumption) we can confidently say 2/3). There’s something legitimately interesting about the problem setup 🙂

Oops, forgot to sign up for comment emails.

The problem right the bat is one which you identified: if he made his first shot, then he should not miss his second shot.

The second problem is “from the coach’s perspective”. The coach is not neurotic, the player is. What the coach sees doesn’t get into the players head, it’s what the plays “sees”/”knows” that ostensibly gets in his head (not withstanding logical error already pointed out.

The coach has basically no information: he has seen 2/3 AND the logical inconsistency. Where one has no information with which to form an opinion, don’t Bayesians ( the 538 ideological bias) require assigning a probability of 50%?

I note that the submission form has been removed presumably because the riddle as presented is less a puzzle than an enigma.

Ignore all the typos

On Ollie’s twitter he says the shooter doesn’t become neurotic until after the 2nd shot. So with that info, the probability of him making the 3rd shot is a coin flip. If he misses the 3rd, the 4th shot has a 33% chance. If he makes the 3rd the 4th now has a 66%. If he makes the 4th(66% likely) based on what we know the 5th has a 75% chance..right?

I think the submission form has been removed because the deadline for submitting an answer was midnight on Sunday.

I took the approach of making it a smaller tree by assuming the coach only walked out for a single shot, then came back. Next I did the same but for the coach walking out for 2 shots, etc. and saw that it never changed the final outcome.

Exactly. If you sample any 3 shots, and he makes 2 out of 3, that should determine the odds of making the next shot.

From the coaches perspective. There are obviously other people who came up with other percentages. I’m just wondering how they got there. I started messing around with how, but life got in the way.

There’s a simple proof that uses the following lemmas, each of which isn’t so hard to prove:

1) The probability of the player throwing a particular sequence of hits and misses depends only on the number of hits (/misses) and not their order. Same for the probability of making the next shot.

2) If the probability of making the next shot is P, then the expected probability of making a shot after a subsequent series of throws with unknown values is still P.

Taken together, this means that the coach has exactly as much information as if he’d seen the player throw hit-miss-hit, i.e., an expected value of 2/3.