Elon Musk’s writing about a Tesla battery fire reminded me of some of the math related to trying to estimate the rate of a rare event from a single occurrence of the event (plus many non-event occurrences). In this article we work through some of the ideas.

Elon Musk wrote that the issues of the recent battery fire were: a significant impact from a large piece of debris (which would clearly have also been dangerous to a gasoline based vehicle) and a “1 fire in over 100 million miles for Tesla” operating record. There are tons of important questions as to what is a proper Apples to Apples comparison of vehicle safety, but what interested me is the minor issue: how biased is evaluating Tesla right after the first occurrence of a rare bad event?

Roughly: evaluating Tesla right after the first report of a rare bad event (called a “failure”) is biased against Tesla. It roughly doubles the perceived rate the event occurs at. The math (based on the Markov property) is quick. If the bad event has probability p (p small) then the expected number of such events in n-trials is n*p. However, the expected number of events in n-trials where we picked n such that the n-th trials has the event (n picked after the events happen) is (n-1)*p + 1 (the normal expectation for the first n-1 events and then the forced 1 for the last event). If n is such that n*p is near 1 (which is plausible for an observation near the first occurrence) then we see the scoring right after the first failure roughly double the perceived failure rate.

Now the correct way to work with duration in this sort of problem is using survival analysis which treats duration as a continuous variable and doesn’t treat “100 million miles” as 100 million discrete events (the discrete treatment introduces small unnecessary problems in changing scale to “200 million half-miles” and so on). But, let’s stay with discrete events for fun. An important issue is: certain relations that are known for true values are only approximations for estimates- so it really matters what you estimate directly and what you infer indirectly.

For example: suppose we try to estimate directly expected duration to first event instead of event rate? Obviously if you know one of these you know the other. But the relationship between estimates of one to estimates of the other is a bit looser. So you really want to set up your experiments to directly estimate the one you care about.

Duration to first failure estimated by watching for the first failure can be estimated as follows. The probability of seeing the first failure on the k-th observation is exactly (1-p)^(k-1) * p ( you see k-1 successes followed by one failure). And if we see the first failure at the k-th step our natural estimate of the duration to first failure is k. The expected value of this sort of estimator summed over all possibilities is:

The 1/p expected value of the estimate is exactly what you would hope for.

However instead suppose we use a similar “reasonable sounding” procedure to try and estimate the rate p instead of the duration to failure. We say that if the first failure is seen at the k-th step then the reasonable estimate for p is 1/k. This estimate ends up being:

That is: our estimation procedure’s expected value of estimate is -p*ln(p)/(1-p) instead of the correct (or unbiased) value of p. This is again an over-estimate, using the observed rate as the estimator is net-upward biased (as also shown in the expected number of failures argument).

Also notice that the multiplicative bias term -ln(p)/(1-p) changes as we change scale of p. If instead of measuring 100 million 1-mile events we measured 200 million half-mile events our number of events bias would go up, but slowly enough that our total miles bias would go down. This is why in survival analysis durations are treated as continuous quantities, so changes in scale don’t change estimates. The simplest survival analysis would assume road-debris is constant hazard (doesn’t systematically go up or down in the age of the car) and therefore the survival function of the car would be the exponential distribution (not “survival” means avoiding the failure event which stops the observations, not living or dying).

The lesson is: you introduce biases by deciding when you calculate (right after a failure) and choosing what to estimate. But with some care you can get all of this right.

A very good article on this topic: “Estimating Rates of Rare Events at Multiple Resolutions,” Deepak Agarwal, Andrei Z Broder, Deepayan Chakrabarti, Dejan Diklic, Vanja Josifovksi, and Mayssam Sayyadian, KDD, 2007.