## Monday, 26 October 2015

### Carnes's life span distribution

In a paper by Carnes et. al, a simple parametric, but realistic life span distribution is given, and here I show how you can sample from it. In addition, assuming a demographic equilibrium, the age of individuals will have a particular distribution. I will show what this distribution is, and again how to sample from it. Sampling ages instead of lifespans might be useful for initiating simulations. I model epidemics, and I want my disease free (a.k.a. virgin) population to have the 'right' age distribution.
The life span distribution has hazard $\lambda(a) = e^{u_1 a + v_1} + e^{u_2 a + v_2}\,.$ Typical parameters are given by $$u_1 = 0.1$$, $$v_1 = -10.5$$, $$u_2 = -0.4$$, and $$v_2 = -8$$, so that infants have a slightly increased hazard of dying, and after the age of 60, the hazard rapidly starts to grow, until it becomes exceedingly large around $$a = 100$$.
The survival function $$S(a) = {\rm Pr}(A > a)$$, where $$A$$ is the random variable denoting 'age at death' is given by $$S(a) = e^{-\Lambda(a)}$$, with $$\Lambda(a) := \int_0^a \lambda(a')da'$$ denoting the cumulative hazard. The cumulative hazard $$\Lambda$$ is easily calculated: $\Lambda(a) = \frac{e^{v_1}}{u_1}(e^{u_1 a}-1) + \frac{e^{v_2}}{u_2}(e^{u_2 a}-1)\,,$ but its inverse, or the inverse of the survival function is more difficult to calculate.
We need the inverse of $$S$$, because sampling random deviates typically involves uniformly sampling a number $$u\in [0,1)$$. The number $$S^{-1}(u)$$ is then the desired deviate.

In a future post, I will show how to use the GNU Scientific Library (GSL) to sample deviates from $$A$$.

Suppose that the birth rate $$b$$ in our population is constant. A PDE describing the population is given by $\frac{\partial N(a,t)}{\partial t} + \frac{\partial N(a,t)}{\partial a} = -\lambda(a)N(a,t)\,,$ where $$N(a,t)$$ is the number (density) of individuals of age $$a$$, alive at time $$t$$. The boundary condition (describing birth) is given by $N(0,t) = b$ When we assume that the population is in a demographic equilibrium, the time derivative with respect to $$t$$ vanishes, and we get an ODE for the age distribution: $\frac{\partial N(a)}{\partial a} = -\lambda(a) N(a)\,,\quad N(0) = b\,,$ where we omitted the variable $$t$$. This equation can be solved: $\frac{1}{N}\frac{\partial N}{\partial a} = \frac{\partial \log(N)}{\partial a} = -\lambda(a) \implies N(a) = c \cdot e^{-\Lambda(a)}$ for some constant $$c$$. Since $$b = N(0) = c \cdot e^{-\Lambda(0)} = c$$, we have $N(a) = b\cdot e^{-\Lambda(a)}\,.$ Hence, we now know the PDF of the age distribution (up to a constant). Unfortunately, we can't get a closed form formula for the CDF, let alone invert it. Therefore, when we want to sample, we need another trick. I've used a method from Numerical Recipies in C. It involves finding a dominating function of the PDF, with an easy, and easily invertible, primitive.
Let's just assume that $$b = 1$$, so that the objective PDF is $$N(a) = e^{-\Lambda(a)}$$. Please notice that $$N$$ is not a proper PDF, since, in general, the expectation won't be $$1$$. We need to find a simple, dominating function for $$N$$. A stepwise defined function might be a good choice, since the hazard is practically zero when the age is below $$50$$, and then increases rapidly. We first find a comparison cumulative hazard $$\tilde{\Lambda}$$ that is dominated by the actual cumulative hazard $$\Lambda$$. Many choices are possible, but one can take for instance $\tilde{\Lambda}(a) = \left\{ \begin{array}{ll} 0 & \mbox{if } a < a_0 \\ \lambda(a^{\ast})\cdot (a-a_0) & \mbox{otherwise} \end{array}\right.$ where $a_0 = a^{\ast} - \frac{\Lambda(a^{\ast})}{\lambda(a^{\ast})}\,.$ The constant $$a_0$$ is chosen such that the cumulative hazards $$\Lambda$$ and $$\tilde{\Lambda}$$ are tangent at $$a^{\ast}$$.
Since $$\Lambda$$ dominates $$\tilde{\Lambda}$$, the survival function $$\tilde{S}$$ defined by $$\tilde{S}(a) = e^{-\tilde{\Lambda}(a)}$$ dominates $$S$$. It is easy to find the inverse of $$a\mapsto\int_0^a \tilde{S}(a')da'$$, and hence we can easily sample random deviates from the age distribution corresponding to $$\tilde{\Lambda}$$. In order to sample from the desired age distribution, one can use a rejection method: (i) sample an age $$a$$ from the easy age distribution. (ii) compute the ratio $$r = S(a)/\tilde{S}(a) \leq 1$$. (iii) sample a deviate $$u \sim \mbox{Uniform}(0,1)$$. (iv) accept the age $$a$$ when $$u \leq r$$, and reject $$a$$ otherwise. Repeat these steps until an age $$a$$ was accepted.

The only thing we still need to do, is to find a good value for $$a^{\ast}$$. To make the sampler as efficient as possible, we want to minimize the probability that we have to reject the initially sampled age $$a$$ from $$S$$. This boils down to minimizing $\int_0^{\infty} \tilde{S}(a)da = a_0 + \frac{1}{\lambda(a^{\ast})} = a^{\ast} + \frac{1 - \Lambda(a^{\ast})}{\lambda(a^{\ast})}\,.$ The derivative of $$a^{\ast} \mapsto \int_0^{\infty} \tilde{S}(a)da$$ equals $\frac{(\Lambda(a^{\ast}) - 1)\lambda'(a^{\ast})}{\lambda(a^{\ast})^2}$ and thus, we find an extreme value for $$\int_0^{\infty} \tilde{S}(a)da$$ when $$\Lambda(a^{\ast}) = 1$$ or when $$\lambda'(a^{\ast}) = \frac{d\lambda}{da^{\ast}} = 0$$. The second condition can only correspond to a very small $$a^{\ast}$$, and therefore will not minimize $$\int_0^{\infty} \tilde{S}(a)da$$. Hence, we have to solve $$\Lambda(a^{\ast}) = 1$$. When we ignore the second term of $$\Lambda$$, we find that: $a^{\ast} = \log(1 + u_1 \exp(-v_1))/u_1$