## The 538 Riddler: Air Travel Nightmare

So last week’s Riddler was a good news/bad news situation. The bad news is that I was wrong, although I did illustrate the correct answer in a follow-up post. The good news is that I still managed to get a shout-out from Ollie in this week’s Riddler, even earning the honorary title of FotR, which I hope means Friend of the Riddler.

Anyway, this week’s Riddler concerns the worst person alive and a full airplane.

There’s an airplane with 100 seats, and there are 100 ticketed passengers each with an assigned seat. They line up to board in some random order. However, the first person to board is the worst person alive, and just sits in a random seat, without even looking at his boarding pass. Each subsequent passenger sits in his or her own assigned seat if it’s empty, but sits in a random open seat if the assigned seat is occupied. What is the probability that you, the hundredth passenger to board, finds your seat unoccupied?

So the answer is 1/2, half, 50%, however you want to say it. And this is true for the last passenger no matter how many passengers are on a (full) plane. I actually worked this out before writing any code (but don’t worry, I still wrote the code!) It’s trivial to show for the case where there are only two passengers: there are two seats, and the person in front of you is the worst person alive. He either takes his seat, or yours, randomly. Fifty percent, done.

For three people (1. Worst Person Alive, 2. Rando Calrissian, and 3. You) and three seats, we can still do it in our heads. WPA takes one of three seats, so right away you have a 1/3 chance of not getting your seat. But there is also a 1/3 chance that he takes Rando’s seat, in which case Rando will randomly take one of the two remaining seats, yours or WPA’s assigned seat. One-third times one-half is 1/6, so there is a 1/6 chance you will find Rando sitting in your seat, and a 1/3 chance WPA will be sitting there. So 1/3 + 1/6 = 1/2. Boom, done.

Once I saw that the probability remained at 50% for three passengers, I suspected that it would be 50% for any number of passengers. I wrote a Monte Carlo to show that.

```// AirplaneSeating.cpp : Defines the entry point for the console application.
// www.somedisagree.com, Jon Wiesman

#include "stdafx.h"
#include <vector>

#define SEAT_COUNT 100
#define EVIL_COUNT 1
#define AVAILABLE -1

class Plane
{
std::vector<int>    m_seats;
int                 m_available;

public:
Plane(int seats)
{
m_seats.resize(seats);
for(int i = 0; i < seats; i++)
m_seats[i] = AVAILABLE;
m_available = seats;
}

void    ClaimSeat(int seat, int passenger)
{
m_seats[seat] = passenger;
m_available--;
}

int     FindUnclaimedSeat() const
{
if(m_available == 0)
return -1;
int nth = rand() % m_available;

int empty = 0;
for(size_t i = 0; i < m_seats.size(); i++)
{
if(IsSeatAvailable(i))
{
empty++;
if(empty > nth)
return i;
}
}
return -1;
}

bool    IsSeatAvailable(size_t seat) const
{
return (seat >= 0 && seat < m_seats.size() && m_seats[seat] == AVAILABLE);
}

void    GetSeat(int passenger, bool evil)
{
if(evil || !IsSeatAvailable(passenger))
{
int seat = FindUnclaimedSeat();
ClaimSeat(seat, passenger);
}
else
{
ClaimSeat(passenger, passenger);
}
}

void    SimulateBoarding(size_t evilCount)
{
for(size_t i = 0; i < m_seats.size(); i++)
{
GetSeat(i, (i < evilCount));
}
}

void    CompileStats(std::vector<int> &stats, int &wrongSeats)
{
for(size_t i = 0; i < m_seats.size(); i++)
{
if(m_seats[i] == i)
stats[i]++;
else
wrongSeats++;
}
}
};

int _tmain(int argc, _TCHAR* argv[])
{
std::vector<int> stats;
int wrongSeats = 0;
stats.resize(SEAT_COUNT);

int trials = 100000;

for(int i = 0; i < trials; i++)
{
Plane plane(SEAT_COUNT);

plane.SimulateBoarding(EVIL_COUNT);
plane.CompileStats(stats, wrongSeats);
}

printf("After %d trials with %d evil passengers:\n", trials, EVIL_COUNT);
for(int i = 0; i < SEAT_COUNT; i++)
{
printf("Passenger %d seated properly %.2f%% of time\n", i, 100.0 * (double)stats[i]/(double)trials);
}
printf("Average of %.2f wrongfully seated passengers\n", (double)wrongSeats / (double)trials);

return 0;
}
```

This code (download) is pretty simple. The Plane class keeps an array of seats and who (which passenger) is seated in each seat. It also keeps the number of empty seats at any time to make it easier to quickly choose a random seat. (For the purpose of this simulation it was convenient to say that the passenger index and seat assignment are the same.)

The constructor initializes the seat array to all available and sets the available count to the seat count. SimulateBoarding tells each passenger to find a seat. Finding a seat is easy. If you’re not evil and your seat is available, sit down. Otherwise, find a random available seat. Either way, call ClaimSeat to sit down and decrease the available member variable. A CompileStats function will increment the number of correct passenger-seat assignments and keep a running count of wrong ones. Finally we run this simulation 100,000 times and print out the results.

```After 100000 trials with 1 evil passengers:
Passenger 0 seated properly 1.00% of time
Passenger 1 seated properly 99.03% of time
Passenger 2 seated properly 99.00% of time
Passenger 3 seated properly 99.00% of time
Passenger 4 seated properly 99.01% of time
Passenger 5 seated properly 98.93% of time
Passenger 6 seated properly 98.97% of time
Passenger 7 seated properly 98.95% of time
Passenger 8 seated properly 98.92% of time
Passenger 9 seated properly 98.98% of time
Passenger 10 seated properly 98.87% of time
.....
Passenger 90 seated properly 90.86% of time
Passenger 91 seated properly 90.03% of time
Passenger 92 seated properly 88.83% of time
Passenger 93 seated properly 87.42% of time
Passenger 94 seated properly 85.39% of time
Passenger 95 seated properly 83.11% of time
Passenger 96 seated properly 79.98% of time
Passenger 97 seated properly 74.74% of time
Passenger 98 seated properly 66.45% of time
Passenger 99 seated properly 49.95% of time
Average of 5.19 wrongfully seated passengers```

So I ran this code really just to confirm that my guess of 50% would be correct. However, I saw something which kind of jogged my memory. The average number of wrongfully seated passengers out of 100 was 5.19. This was the average number of traffic jams with 100 cars from two weeks ago! (The exclamation point at the end of the last sentence might be the nerdiest exclamation point ever.) It turns out that the number of wrongfully seated passengers follows a Harmonic Series.

What this means is that the odds of anyone getting a wrong seat follows this pattern, starting from the last passenger to board: 1/2 + 1/3 + 1/4 + … You can see from the output that the chances of each passenger not getting their seat is indeed following that pattern, with a little Monte Carlo variance thrown in.

“But wait,” you say, probably not, but let’s pretend you do, “a Harmonic Series starts with one and you don’t have a one in there.” Oh but I do. But it comes at the end. Remember that in our simulation, Worst Person Alive boards first and therefore has a 99/100 chance of sitting in the wrong seat. The next person to board, let’s call him Rando Paul, has a 1/100 chance of finding WPA in  his seat. 99/100 + 1/100 = 1. The next person then has a 1/99 chance and on down to 1/2 for the last passenger. So the average number of wrongly seated passengers for N passengers with the Worst Person Alive seating first boils down to 1 + 1/2 + 1/3 + … + 1 / (N – 1).

Traffic jams and airline travel: unexpected sources of harmony. Thanks for reading!

## Follow up to Duck vs. Dog

Okay, well, my luck has officially run out. I was definitely, most certainly wrong wrong wrong about how much faster the dog could travel and still allow the duck to escape. Thanks once again to commenters Alex Mahdavi and Lego Haryanto who found the right answer. Alex pointed me to this DataGenetics blog post from a few years ago that answered the question with a monster and a rower in a boat. As I suspected the answer was more mathy than my simulation could handle, although a lot of it was algebra and trig, with a bit of calculus.

The nice thing was that once I understood the solution, it was really easy to rewrite the SmartDuckStrategy to use the “J” strategy outlined in the post. The result is exactly what was predicted by Datagenetics. Here’s a GIF:

Yay! It’s nice when things work out, even if I didn’t solve this one (or get close) myself. The new code is a little simplistic in that it assumes the dog will start at a position of 0 degrees, but here it is:

```DPoint SmartDuckStrategy::GetDestinationVector(Pond *pond)
{
if(pond->GetBeeline())
return pond->GetLastDuckVector();

DPoint focus(0, -1 / (pond->GetDogSpeed() * 2.0));

DPoint duckPos = pond->GetDuckPos();

DPoint delta = duckPos - focus;

double angleFromFocus = atan2(delta.y, delta.x);

if(angleFromFocus > -PI / 2.0 && angleFromFocus < 0)
{
pond->SetBeeline();
return DPoint(1, 0);
}
angleFromFocus += PI / 2.0;

DPoint vec(cos(angleFromFocus), sin(angleFromFocus));

return vec;
}
```

So I take some comfort that my underlying simulation was solid enough that just changing the behavior of the duck was sufficient to illustrate the right answer. It’s not much, but it’s all I have!

Thanks again to Alex and Lego for posting comments. It means a lot.

## The 538 Riddler: A Dog and a Duck in Purgatory

UPDATED 2/15 2:09 PM: Gonna save you the suspense on this one, folks. I was wrong, very wrong. See the follow up here.

Things are turning a bit dark at the 538 Riddler as this week Ollie imagines a frantic dog and a beleaguered duck trapped in a never-ending Kafkaesque chase. My answer for last week’s puzzle was correct, but as I said later, it was much more complicated than it needed to be.

This week, however, I don’t think I will even get a correct-but-overly-complex answer. I’m positive that my answer is wrong.

Here’s the scenario for these poor creatures:

There is a duck paddling in a perfectly circular pond, and a Nova Scotia duck tolling retriever prowling on the pond’s banks. The retriever very much wants to catch the duck. The dog can’t swim, and this particular duck can’t take flight from water. But the duck does want to fly away, which means it very much wants to make it safely to land without meeting the dog there. Assume the dog and the duck are both very smart and capable of acting perfectly in their best interests — they’re both expert strategists and tacticians. Say the duck starts in the center of the pond, and the dog somewhere right on the pond’s edge. To ensure that the duck could never escape safely, how many times faster would the dog have to be able to run relative to how fast the duck can swim? (Assume the dog is a good boy, yes he is, and thus very patient. The duck can’t simply wait him out. Also assume both animals can change direction without losing speed, and that the duck can take flight safely immediately after reaching shore if the dog isn’t there.)

Okay, so obviously there’s a mathy solution to this problem, but I don’t know it. I wrote a program to simulate the dog and duck and gave the duck some different strategies for getting to the shore. One thing that is obvious: the dog will never catch the duck. Either the duck will escape because the dog is too slow, or the dog will circle the pond forever as the duck swims back in forth in vain. Geez, Ollie, next to this, last week’s nightmare traffic problem is looking downright cheerful.

Designing the simulation was pretty fun, even if I don’t think it led me to the right solution. I basically came up with four different strategies that the duck might take to escape the pond. I call these Dumb, Cowardly, Clever, and Smart. Each strategy has a threshold at which it stops working, which is question being asked.

The Dumb strategy simply involves moving due west and hoping the dog is too slow. It will succeed as long as the dog is moving slower than PI * the duck’s speed.

The Cowardly strategy just tries to avoid the dog at all costs, changing the duck’s direction to be away from the dog. Interestingly, the Cowardly strategy does worse than the dumb strategy for the duck! The dog can “scare” the duck into remaining in the pond by getting close enough to scare it back towards the center of the pond, even when moving slower (0.9483) than PI * the duck’s speed. All ducks die, but a cowardly duck is doomed to swim in a pond forever or something.

The Clever strategy works like the Cowardly strategy except that the duck is constantly calculating how long it will take to reach the shore in a straight line, and whether the dog can make it there first. If the duck can make it to the edge first, it makes a beeline for the shore. The Clever strategy requires the dog to travel 1.2093 * PI * the duck’s speed to prevent takeoff.

Finally, my last strategy was the Smart strategy. This is what I submitted for an answer, but I will be shocked if it is right. The Smart strategy acts like the Clever strategy except the duck will scan a range of spots on the shore and if ANY of them can be reached before the dog, it will beeline. It represents only a small improvement over Clever, requiring the dog to travel 1.2552 * PI * the duck’s speed to trap the fowl.

A couple of things to note here. When the duck escapes, the dog is often so close that it looks like they are overlapping, but this is an abstract duck with no size, only a location, so unless the dog is exactly at the right spot, the duck escapes if it crosses the 1 unit border of the pond. Also, my strategy for the dog is to just go to the point on the edge of the pond that is closest to the duck and wait like a good boy, who’s a good boy, you’re a good boy. I’m not sure if that’s optimal but any movement from that spot seems like a mistake until the duck changes course or crosses the center point. This is one are where I think I am probably wrong.

The other area where I could be am probably wrong is in the way that I have the Smart duck scan the shore for a likely escape route. I just choose a range of shoreline and increment test locations in discrete chunks looking for a candidate. At the very least, a mathy solution would do that continuously, not discretely.

I wrote this app using MFC and Gdiplus so I’m not going to include the whole project here. I’ll just post the pond.h and pond.cpp files, which is where all the calculations and simulation are being done. Everything else is just presentation.

First off, I assume the pond has a radius of 1 unit with its center at 0,0, and that the duck moves at a rate of 1 unit/second. I store an x- and y-variable for the duck and start it at the center of the pond.

I could have used x- and y-variables for the dog, but since it always moves along the edge of the circular pond, I instead store its position as a single real value from -PI to PI. (It really should be 0 to TAU, because PI is wrong, no really, but whatevs.) I start the dog at 0, meaning the East edge of the pond.

The dog’s speed is expressed in terms of the duck’s speed (1/sec) * TAU/2 (okay, fine, PI.) So if the dog moves at PI * the duck’s speed, he can make a half orbit of the pond by the time the duck moves from the center to the edge. Using these basic units makes the code very simple since the time required for the duck to reach the edge is equal to 1 minus its distance from the center of the pond. The time required for the dog to reach any point of the pond’s edge is simply the angle to that point minus the dog’s position divided by the dog’s speed.

I also created a simple DPoint class with some convenient operators to make things a little easier.

```// Pond.h

#pragma once
#include <math.h>

#define PI 3.1415926535897932384626433832
#define TAU (PI * 2)

enum EDuck {
DumbDuck,
CowardlyDuck,
CleverDuck,
SmartDuck,
};

struct DPoint
{
double  x;
double  y;

void   Set(double _x, double _y) {x = _x; y = _y;}
double Magnitude() const
{
return sqrt(x * x + y * y);
}
};
static DPoint operator -(const DPoint &l, const DPoint &r)
{
DPoint pt;
pt.x = l.x - r.x;
pt.y = l.y - r.y;
return pt;
}
static DPoint operator +(const DPoint &l, const DPoint &r)
{
DPoint pt;
pt.x = l.x + r.x;
pt.y = l.y + r.y;
return pt;
}
static DPoint operator *(const DPoint &l, const DPoint &r)
{
DPoint pt;
pt.x = l.x * r.x;
pt.y = l.y * r.y;
return pt;
}
static DPoint operator *(const DPoint &l, double scalar)
{
DPoint pt;
pt.x = l.x * scalar;
pt.y = l.y * scalar;
return pt;
}

class Pond;

class DuckStrategy
{
public:
virtual DPoint  GetDestinationVector(Pond *pond) = 0;
};

class DumbDuckStrategy : public DuckStrategy
{
public:
virtual DPoint  GetDestinationVector(Pond *)
{
DPoint pt;
pt.x = -1.0;
pt.y = 0;
return pt;
}
};

class CowardlyDuckStrategy : public DuckStrategy
{
public:
virtual DPoint  GetDestinationVector(Pond *pond);
};

class CleverDuckStrategy : public CowardlyDuckStrategy
{
public:
virtual DPoint  GetDestinationVector(Pond *pond);
};

class SmartDuckStrategy : public CowardlyDuckStrategy
{
public:
virtual DPoint  GetDestinationVector(Pond *pond);
};

class Pond
{
double  m_dogPos;   // in radians, 0-tau
DPoint  m_duckPos;
DPoint  m_lastDuckVector;
EDuck   m_behavior;
bool    m_escaped;
bool    m_beeline;
double  m_elapsed;
double  m_beelineTime;

CowardlyDuckStrategy    m_cowardly;
DuckStrategy    *m_strategy;

public:
Pond() : m_dogPos(0)
{
SetStrategy(&m_cowardly);
Restart(PI);
m_escaped = true;   // so we have to hit restart to start
}

void    Restart(double dogSpeed)
{
m_beeline = false;
m_elapsed = 0;
m_beelineTime = 0;
m_dogPos = 0;
m_dogSpeed = dogSpeed;
m_escaped = false;
m_duckPos.Set(0, 0);
m_lastDuckVector.Set(0, 0);
}

void    SetStrategy(DuckStrategy *strat)
{
m_strategy = strat;
}

void    SetBeeline() {m_beeline = true; m_beelineTime = m_elapsed;}
bool    GetBeeline() const {return m_beeline;}
double  GetElapsed() const {return m_elapsed;}
DPoint  GetDuckPos() const {return m_duckPos;}
double  GetDogPolar() const {return m_dogPos;}
double  GetDogSpeed() const {return m_dogSpeed;}
DPoint  GetLastDuckVector() const {return m_lastDuckVector;}
DPoint  GetDogPos() const
{
DPoint pos;
pos.x = cos(m_dogPos);
pos.y = sin(m_dogPos);
return pos;
}
double  GetDogTimeToDest(double dest) const;
double  GetDuckTimeToDest(double dest) const;

void    Process(double sec)
{
if(m_escaped || m_strategy == NULL)
return;

// get duck vector
DPoint duckVec = m_strategy->GetDestinationVector(this);
m_lastDuckVector = duckVec;

// get dog destination
double mag = m_duckPos.Magnitude();
double dogDest = PI;
if(mag != 0.0)
{
double s = m_duckPos.y / mag;
double c = m_duckPos.x / mag;
dogDest = atan2(s, c);
}

// update duck position
duckVec = duckVec * sec;
m_duckPos = duckVec + m_duckPos;
mag = m_duckPos.Magnitude();
if(mag >= 1.0)
{
double overage = mag - 1.0;
m_duckPos = m_duckPos * (1 / mag);
sec -= overage;
m_escaped = true;
}
m_elapsed += sec;
if(mag < sec)
{
m_duckPos.Set(0, 0);
}

// update dog position
double dogDelta = dogDest - m_dogPos;
if(dogDelta < -TAU)
dogDelta += TAU;
else if(dogDelta > TAU)
dogDelta -= TAU;

double dogProgress = m_dogSpeed * sec;
if(dogProgress > abs(dogDelta))
{
m_dogPos = dogDest;
}
else
{
m_dogPos += dogProgress;
if(m_dogPos > PI)
m_dogPos -= TAU;
else if(m_dogPos < -PI)
m_dogPos += TAU;
}
}
};
```

pond.h

```// Pond.cpp

#include "stdafx.h"
#include "Pond.h"

DPoint CowardlyDuckStrategy::GetDestinationVector(Pond *pond)
{
DPoint dogPos = pond->GetDogPos();
DPoint duckPos = pond->GetDuckPos();

DPoint pt = duckPos - dogPos;

double mag = pt.Magnitude();
if(mag == 0)
{
pt.x = -1.0;
pt.y = 0;
}
else
{
pt.x /= mag;
pt.y /= mag;
}
return pt;
}

DPoint CleverDuckStrategy::GetDestinationVector(Pond *pond)
{
DPoint curVec = pond->GetDuckPos();
double mag = curVec.Magnitude();

if(mag == 0)
return CowardlyDuckStrategy::GetDestinationVector(pond);

double toGoal = 1.0 - mag;

double dogPos = pond->GetDogPolar();
double s = curVec.y / mag;
double c = curVec.x / mag;
double dogDest = atan2(s, c);

double dogTime = pond->GetDogTimeToDest(dogDest);

if(dogTime > toGoal)
{
pond->SetBeeline();
curVec = curVec * (1 / mag);
return curVec;
}

return CowardlyDuckStrategy::GetDestinationVector(pond);
}

DPoint SmartDuckStrategy::GetDestinationVector(Pond *pond)
{
DPoint curVec = pond->GetDuckPos();
double mag = curVec.Magnitude();

if(pond->GetBeeline())
{
return pond->GetLastDuckVector();
}

if(mag == 0)
return CowardlyDuckStrategy::GetDestinationVector(pond);

double s = curVec.y / mag;
double c = curVec.x / mag;
double dest = atan2(s, c);

double start = dest;
double end = dest + PI / 8.0;
double inc = (end - start) / 100;

for(double test = start; test <= end; test += inc)
{
double duckTime = pond->GetDuckTimeToDest(test);
double dogTime = pond->GetDogTimeToDest(test);
if(duckTime < dogTime)
{
DPoint pt;
pt.Set(cos(test), sin(test));

pt = pt - curVec;

pt = pt * (1 / pt.Magnitude());
double one = pt.Magnitude();
pond->SetBeeline();
return pt;
}
}

return CowardlyDuckStrategy::GetDestinationVector(pond);
}

double Pond::GetDuckTimeToDest(double dest) const
{
DPoint pt;
pt.x = cos(dest);
pt.y = sin(dest);

pt = pt - m_duckPos;
return pt.Magnitude();
}

double Pond::GetDogTimeToDest(double dogDest) const
{
double dogPos = GetDogPolar();

double dogDelta = dogDest - dogPos;
if(dogDelta < -PI)
dogDelta += TAU;
else if(dogDelta > PI)
dogDelta -= TAU;

double degrees = dogDelta * 360.0/PI;

dogDelta = abs(dogDelta);
ASSERT(dogDelta <= PI);

double dogTime = dogDelta / GetDogSpeed();
return dogTime;
}
```

pond.cpp

I chose to wrap most of the code for my pond in a single object that owns the duck and the dog. I could have chosen to make separate classes for the duck and dog and maybe have multiple ducks and multiple dogs each taking different approaches but then I thought, what if I didn’t instead.

As I said, I don’t think I got this one right, so I’ll be really interested to see what Ollie posts as the answer next Friday. Thanks for reading!

## Traffic Jam Harmony

So this week’s Riddler caused me a bit of trouble, even though I got the a right answer pretty quickly. It’s not that I was wrong; it’s that my answer was so convoluted when there is such a better way to say it. Here’s what I wrote:

Average number of packs A(n) is equal to: Num(n) / Denom(n) where:
Num(1) = 1, Denom(1) = 1.
Num(n) = Num(n – 1) * n + Denom(n – 1).
Denom(n) = Denom(n – 1) * n.  (this is n-factorial).

But here’s the much better way to say that:

Average number of packs for n cars = 1 + 1/2 + 1/3 + 1/4 + 1/5 … + 1/n.

This is, as pointed out by Alex Mahdavi, a Harmonic Series. Commenter Abe then offered  a pretty good explanation for why this is the right answer.

Start adding cars slowest to fastest.

For car k there are k places to place the new car, only one of which will add a new group: adding at the front. Otherwise you’re putting it behind a slower car and it will join that car’s group.

So:

Groups(n) = Groups(n – 1) * (n – 1)/n + (Groups(n -1) + 1)/n — the number of groups from before times the chance it isn’t at the fron, and one more than last time, times the chances it is at the front.

This reduces to:

Groups(n) = Groups(n-1) + 1/n, which is the harmonic series.

Thanks Abe! That does make it easier. If I know how many packs I have with N cars, and I add one more car, then there’s N+1 possible speeds for that car, and a 1/N+1 chance that that car will form a new pack. Makes sense.

Anyway, all that code I wrote before is really unnecessary now. You can find the answer very quickly:

```void AverageJams(int carCount)
{
double avg = 0;

for(int i = 1; i <= carCount; i++)
{
avg += 1.0 / (double)i;
}
double perN = avg / (double)carCount;
printf("Average jams for %d cars = %.6f or %.6fn\n", carCount, avg, perN);
}
```

Note: this would be a good candidate for memoization, but I didn’t bother.

And some output:

```Average jams for 1 cars = 1.000000 or 1.000000n
Average jams for 2 cars = 1.500000 or 0.750000n
Average jams for 3 cars = 1.833333 or 0.611111n
Average jams for 4 cars = 2.083333 or 0.520833n
Average jams for 5 cars = 2.283333 or 0.456667n
Average jams for 6 cars = 2.450000 or 0.408333n
Average jams for 7 cars = 2.592857 or 0.370408n
Average jams for 8 cars = 2.717857 or 0.339732n
Average jams for 9 cars = 2.828968 or 0.314330n
Average jams for 10 cars = 2.928968 or 0.292897n
Average jams for 11 cars = 3.019877 or 0.274534n
Average jams for 12 cars = 3.103211 or 0.258601n
Average jams for 13 cars = 3.180134 or 0.244626n
Average jams for 14 cars = 3.251562 or 0.232254n
Average jams for 15 cars = 3.318229 or 0.221215n
Average jams for 16 cars = 3.380729 or 0.211296n
Average jams for 17 cars = 3.439553 or 0.202327n
Average jams for 18 cars = 3.495108 or 0.194173n
Average jams for 19 cars = 3.547740 or 0.186723n
Average jams for 20 cars = 3.597740 or 0.179887n
Average jams for 21 cars = 3.645359 or 0.173589n
Average jams for 22 cars = 3.690813 or 0.167764n
Average jams for 23 cars = 3.734292 or 0.162361n
Average jams for 24 cars = 3.775958 or 0.157332n
Average jams for 100 cars = 5.187378 or 0.051874n
Average jams for 10000 cars = 9.787606 or 0.000979n
Average jams for 1000000 cars = 14.392727 or 0.000014n
```

So, I was technically correct with my long, terrible explanation, but the better answer was very simple. The original code I wrote to find the answer was helpful however. Without it, I probably wouldn’t have ever gotten to even the answer I got, much less the harmonic series.

## The 538 Riddler: Traffic Jam

Hello reader(s)! Well, I’m 4-for-4 on the Riddlers so far and I’m having a lot of fun writing the code and talking about my approach, and also reading about the approaches that others have taken. I hope to keep doing this.

This week’s Riddler concerns my daily commute to work on most mornings, seemingly. Here it is:

There is a very long, straight highway with some number of cars (N) placed somewhere along it, randomly. The highway is only one lane, so the cars can’t pass each other. Each car is going in the same direction, and each driver has a distinct positive speed at which she prefers to travel. Each preferred speed is chosen at random. Each driver travels at her preferred speed unless she gets stuck behind a slower car, in which case she remains stuck behind the slower car. On average, how many groups of cars will eventually form? (A group is one or more cars travelling at the same speed.)

For example, if the car in the very front happens to be slowest, there will be exactly one group — everybody will eventually pile up behind the slowpoke. If the cars happen to end up in order, fastest to slowest, there will be Ngroups — no car ever gets stuck behind a slower car.

So my first approach to this was just to get some ballpark answers so I would know what kind of real answers to expect. I think a real mathematician would have taken a very different route to get to this answer, but I’m a programmer, so the first thing I did was write a Monte Carlo to just simulate what was happening on the road, man.

```int TrafficJam(int carCount)
{
int currentSpeed = 0;
int packs = 0;
int packSize = 0;

for(int i = 0; i < carCount; i++)
{
int speed = 1 + rand();
if((speed < currentSpeed) || (currentSpeed == 0))
{
if(packSize > 0)
{
//printf("%d cars in pack.\n", packSize);
}
packs++;
currentSpeed = speed;
//printf("New pack forming at %d, ", speed);
packSize = 1;
}
else
{
packSize++;
}
}
//printf("%d cars in pack.\n", packSize);
//printf("With %d cars, %d packs formed.\n\n", carCount, packs);
return packs;
}

void TrafficJamMC(int carCount)
{
const int c_nTrials = 10000;
int totalPacks = 0;
for(int i = 0; i < c_nTrials; i++)
{
totalPacks += TrafficJam(carCount);
}

double avgPacks = (double)totalPacks / (double)c_nTrials;
printf("After %d trials, average pack count with %d cars is %.3f or %.3fn\n"
, c_nTrials, carCount, avgPacks, avgPacks / (double)carCount);
}
```

Here’s the output for up to 25 cars:

```After 10000 trials, average pack count with 1 cars is 1.000 or 1.000n
After 10000 trials, average pack count with 2 cars is 1.504 or 0.752n
After 10000 trials, average pack count with 3 cars is 1.826 or 0.609n
After 10000 trials, average pack count with 4 cars is 2.087 or 0.522n
After 10000 trials, average pack count with 5 cars is 2.302 or 0.460n
After 10000 trials, average pack count with 6 cars is 2.462 or 0.410n
After 10000 trials, average pack count with 7 cars is 2.572 or 0.367n
After 10000 trials, average pack count with 8 cars is 2.725 or 0.341n
After 10000 trials, average pack count with 9 cars is 2.824 or 0.314n
After 10000 trials, average pack count with 10 cars is 2.922 or 0.292n
After 10000 trials, average pack count with 11 cars is 3.016 or 0.274n
After 10000 trials, average pack count with 12 cars is 3.108 or 0.259n
After 10000 trials, average pack count with 13 cars is 3.170 or 0.244n
After 10000 trials, average pack count with 14 cars is 3.245 or 0.232n
After 10000 trials, average pack count with 15 cars is 3.314 or 0.221n
After 10000 trials, average pack count with 16 cars is 3.363 or 0.210n
After 10000 trials, average pack count with 17 cars is 3.436 or 0.202n
After 10000 trials, average pack count with 18 cars is 3.489 or 0.194n
After 10000 trials, average pack count with 19 cars is 3.522 or 0.185n
After 10000 trials, average pack count with 20 cars is 3.585 or 0.179n
After 10000 trials, average pack count with 21 cars is 3.643 or 0.173n
After 10000 trials, average pack count with 22 cars is 3.693 or 0.168n
After 10000 trials, average pack count with 23 cars is 3.732 or 0.162n
After 10000 trials, average pack count with 24 cars is 3.759 or 0.157n
After 10000 trials, average pack count with 25 cars is 3.816 or 0.153n
```

So this code works ok and gives us reasonable answers, but it has some obvious flaws. As with most Monte Carlos it will never give exact answers.

Also, simulating speeds might be fun, but it’s not really getting to the heart of the problem. What we really want to do is order all the cars and see how many jams would result. For any n number of cars we know that there are n-factorial variations of ordering.

For 2 cars, there are only 2 orders, for 3 cars there are 6 orders, and for 4 cars there are 24 different orders. They produce packs like this:

2 cars:
0 1 – 1 pack
1 0 – 2 packs

3/2 packs

3 cars:
0 1 2 – 1 pack
0 2 1 – 1 pack
1 0 2 – 2 packs
1 2 0 – 2 packs
2 0 1 – 2 packs
2 1 0 – 3 packs
—-
11/6 packs

4 cars:
0 1 2 3 – 1 pack
0 1 3 2 – 1 pack
0 2 1 3 – 1 pack
0 2 3 1 – 1 pack
0 3 1 2 – 1 pack
0 3 2 1 – 1 pack
1 0 2 3 – 2 packs
1 0 3 2 – 2 packs
1 2 0 3 – 2 packs
1 2 3 0 – 2 packs
1 3 0 2 – 2 packs
1 3 2 0 – 2 packs
2 0 1 3 – 2 packs
2 0 3 1 – 2 packs
2 1 0 3 – 3 packs
2 1 3 0 – 3 packs
2 3 0 1 – 2 packs
2 3 1 0 – 3 packs
3 0 1 2 – 2 packs
3 0 2 1 – 2 packs
3 1 0 2 – 3 packs
3 1 2 0 – 3 packs
3 2 0 1 – 3 packs
3 2 1 0 – 4 packs

50/24 packs

Let’s write some code to run through them all and count the jams. Yay, we get to use next_permutation, one of my favorite std algorithms. Here’s a function that will run through all permutations of n-factorial cars and tell you precisely how many jams you have:

```#define MAX_CARS_FOR_PERMS 10
void TrafficJamPerm(int carCount)
{
if(carCount > MAX_CARS_FOR_PERMS)
return;

int order[MAX_CARS_FOR_PERMS] = {};
for(int i = 0; i < carCount; i++)
order[i] = i;

int totalPacks = 0;
int perms = 0;
do
{
int packs = 0;
int currentSpeed = 0;
for(int i = 0; i < carCount; i++)
{
if(i == 0 || order[i] < currentSpeed)
{
packs++;
currentSpeed = order[i];
}
}
totalPacks += packs;
perms++;
}
while(next_permutation(order, order + carCount));

printf("Using permutations, packs for %d cars = %d / %d = %.3f or %.3fn\n"
, carCount, totalPacks, perms, (double)totalPacks / (double)perms, ((double)totalPacks / (double)perms) / (double)carCount);
}
```

And here’s the output:

```Using permutations, packs for 1 cars = 1 / 1 = 1.000 or 1.000n
Using permutations, packs for 2 cars = 3 / 2 = 1.500 or 0.750n
Using permutations, packs for 3 cars = 11 / 6 = 1.833 or 0.611n
Using permutations, packs for 4 cars = 50 / 24 = 2.083 or 0.521n
Using permutations, packs for 5 cars = 274 / 120 = 2.283 or 0.457n
Using permutations, packs for 6 cars = 1764 / 720 = 2.450 or 0.408n
Using permutations, packs for 7 cars = 13068 / 5040 = 2.593 or 0.370n
Using permutations, packs for 8 cars = 109584 / 40320 = 2.718 or 0.340n
Using permutations, packs for 9 cars = 1026576 / 362880 = 2.829 or 0.314n
Using permutations, packs for 10 cars = 10628640 / 3628800 = 2.929 or 0.293n
```

Ok, cool, but do you see the problem? Yeah, this code only handles up to 10 cars and that’s because factorials get really big, really fast. The first 9 calls to this function return almost immediately but the 10th one starts to chug a little. A 32-bit integer could handle up to 12-factorial and 64 bits could get us to 20-factorial, but it doesn’t matter because we don’t have enough time to run through all those permutations. So this function works and is useful, but… it’s not great.

So there IS a math solution to this problem that doesn’t rely on Monte Carlos or permutations. I found it but I don’t know how to prove it. I’m not really a mathematician, I’m a programmer, so here’s my best explanation:

Average number of packs A(n) is equal to: Num(n) / Denom(n) where:
Num(1) = 1, Denom(1) = 1.
Num(n) = Num(n – 1) * n + Denom(n – 1).
Denom(n) = Denom(n – 1) * n.  (this is n-factorial).

Okay, so I found that because I happened to see the pattern when I was looking at those numerators and denominators. I don’t really know why this is true. I can kind of work it out if I look at the lists of permutations between 2 cars and 3 cars and 4 cars, but I can’t really explain it well.

Anyway, here’s a function to calculate the average number of jams using math. I used a class called InfInt written by Sercan Tutar for this. This allows me to calculate factorials for very large numbers.

```void TrafficJamMath(int carCount)
{
InfInt num = 1;
InfInt denom = 1;

for(int i = 2; i <= carCount; i++)
{
num *= i;
num += denom;
denom *= i;
}

num *= 1000000;
int l = (num / denom).toInt();

double avg = (double)l / 1000000.0;
double perN = avg / (double)carCount;

printf("Using math, number of Traffic Jams for %d cars = %.3f or %.6fn\n", carCount
, avg, perN);
}
```

I ran this function to calculate the average traffic jams for 100, 1000, and 10000 cars.

```Using math, number of Traffic Jams for 100 cars = 5.187 or 0.051874n
Using math, number of Traffic Jams for 1000 cars = 7.485 or 0.007485n
Using math, number of Traffic Jams for 10000 cars = 9.788 or 0.000979n
```

Let me just say that 10000-factorial is a very large number. It’s kind of interesting how we quickly get to 3 traffic jams with 10 cars, but with 10,000 cars we are still at only about 10 very large, frustrating jams. This explains Southern California freeways pretty well, actually.

Finally, crossing my fingers, but I think I found a pretty good solution for displaying source code in this blog. Special thanks to Alexander Kojevnikov for hilite.me. Here’s the complete source code for this post. You will need the InfInt class linked above.

UPDATE: Alex Mahdavi points out that my really complicated equation can just be written as 1 + 1/2 + 1/3 + 1/4 + … + 1/n, also known as a Harmonic Series. Thanks Alex!

## All the Riddler Source Code

So I’ve gotten sick of fighting with WordPress in each post trying to nicely format my C++ code when I post, so I decided to start posting it to textuploader.com and providing links to the formatted and raw files. This will allow you to easily follow the link and copy-pasta the code if you want.

Here are links to previous Riddler posts and source code:

Secret Election: Post. Source.

River Crossing: Post. Source.

Neurotic Shooter: Post. Source.

From now on, I’ll just take a picture of the relevant code for a post and post that, with a link to the raw code. Or something. It’s a work in progress.

## Tweaking the Secret Election code

While waiting to see the results of this week’s Riddler, I started thinking about my solution to the problem. In my last post, I outlined two ways of solving the problem: one by hand on paper, and one using C++ and recursion with memoization.

If you look at the way I solved the puzzle by hand, you might recognize it as a form of dynamic programming (my coworker thinks I’m obsessed with finding ways to use dynamic programming; mea culpa). So I decided to write a version of the SimulateElection function using it instead of recursion. Here it is:

```// initialize winner array with candidates
char winner[CANDIDATE_COUNT+1] = &quot;&quot;;
for(int i = 0; i &lt; CANDIDATE_COUNT; i++)
winner[i] = 'A' + i;

for(int cand = CANDIDATE_COUNT - 1; cand &gt; 0; cand--)
{
// now simulate election for cand vs. winners in each 'row'
for(int i = 0; i &lt; cand; i++)
winner[i] = RawVote(winner[i], winner[cand]);
}

```

Pretty short! The key here is to start our array of winners with the candidates in the order in which their secret elections happen, so “ABCDE”. Then in each round, starting from the last election, we hold a raw election between the current value we are evaluating, and the values in each of the preceding indices. So:

“ABCDE” – Starting values. Now we pit E vs. first four values, A, B, C, and D.

“ABEDE” – Values after round 1, because E defeated C, but lost to A, B, and D. Now pit D vs. values in first 3 spots, A, B, and E.

“DDDDE” – Values after round 2, because D defeated A, B, and E. Now pit 3rd value, D, vs. first two spots, D and D.

“DDDDE” – Values after round 3, Now pit 2nd value, D, vs. 1st value A.

“DDDDE” – We are done. Our first value, D, is the winner.

And this is how it goes when we allow candidate A to select candidate E as a proxy:

“ABCDE” – Starting values.

“AEEEE” – After round 1. Candidate E defeated everyone but A. Rounds 2 and 3 will pit E vs. E. The final round will pit A vs. E again, and A will win.

I think it’s easier to look at the recursive code and understand what is going on conceptually. Sometimes it’s hard to see how dynamic programming is “working” when you examine the code, but that might be my own bias.

I did some performance testing on the dynamic programming vs. the recursion. I also made a version of SimulateElection that used recursion but NOT memoization, meaning that all recursive branches were visited. The complexity for the recursive algorithm is n-factorial with n storage, but using memoization reduces it to n-squared with n-squared storage. Using dynamic programming is still n-squared time, but only n storage. Although dynamic programming and memoization are both n-squared time, there’s a little less overhead in the dynamic programming. Here are some time trial results for all three methods:

```Elapsed ticks 100000 trials using recursion and memoization = 355 ms.
Elapsed ticks 100000 trials using recursion = 476 ms.
Elapsed ticks 100000 trials using dynamic programming = 283 ms.```

All of them are pretty fast because the data is so small. I had to run 100,000 trials just to get decent numbers to compare.

Full source code here.