Chapter 3 Session 3- Fay Herriot Model - Poverty Estimation

The Fay Herriot model, proposed by Fay and Herriot (1979), is a statistical area model and is the most commonly used. It’s essential to note that within small area estimation methodology, area models are the most applied because having individual-level information is often not feasible. Instead, we typically have data at the area level along with associated auxiliary information. This mixed linear model was the first to include random effects at the area level, implying that most of the information fed into the model corresponds to usually aggregated units like departments, regions, provinces, municipalities, among others. The estimates obtained from the model are over these aggregations or subpopulations.

Now, the Fay Herriot model relates area indicators \(\theta_d\), where \(d\) ranges from 1 to \(D\), assuming they vary with respect to a vector of \(p\) covariates \(\boldsymbol{x}_d\). The model is defined by the equation \(\theta_d = \boldsymbol{x}^{T}_{d}\boldsymbol{\beta} + u_d\), where \(u_d\) is the error term or random effect, distinct for each area and distributed as \(u_{d} \stackrel{ind}{\sim}\left(0,\sigma_{u}^{2}\right)\).

However, the true values of the indicators \(\theta_d\) are not observable. Hence, the direct estimator \(\hat{\theta}^{DIR}_d\) is used to estimate them, leading to sampling error. This estimator is still considered unbiased under the sample design, i.e., \[ \hat{\theta}_d^{DIR} = \theta_d + e_d \]

The model is then fitted using the sampling error term \(e_d\), where \(e_{d} \stackrel{ind}{\sim} \left(0,\sigma^2_{e_d}\right)\), and the variances \(\sigma^2_{e_d}\) are estimated using survey microdata. The FH model is rewritten as

\[ \hat{\theta}^{DIR}_{d} = \boldsymbol{x}^{T}_{d}\boldsymbol{\beta} + u_d + e_d \].

The best linear unbiased predictor (BLUP) under the FH model is given by

\[ \tilde{\theta}_{d}^{FH} = \boldsymbol{x}^{T}_{d}\tilde{\boldsymbol{\beta}}+\tilde{u}_{d} \],

where \(\tilde{u}_d = \gamma_d\left(\hat{\theta}^{DIR}_{d} - \boldsymbol{x}^{T}_{d}\tilde{\boldsymbol{\beta}} \right)\) and \(\gamma_d=\frac{\sigma^2_u}{\sigma^2_u + \sigma^2_{e_d}}\).

Area Model for Poverty Estimation

Let \(P_d\) be the probability of finding a person in a state of poverty in the \(d\)-th domain of the population. Then, the direct estimator of \(P_d\) can be written as:

\[ \hat{P}^{DIR}_{d} = P_d + e_d \]

Now, \(P_d\) can be modeled as follows:

\[ P_d = \boldsymbol{x}^{T}_{d}\boldsymbol{\beta} + u_d \]

Rewriting \(\hat{P}^{DIR}_{d}\) in terms of the two previous equations, we have:

\[ \hat{P}^{DIR}_{d} = \boldsymbol{x}^{T}_{d}\boldsymbol{\beta} + u_d + e_d \]

It is then possible to assume that \(\hat{P}^{DIR}_d \sim N(\boldsymbol{x}^{T}_{d}\boldsymbol \beta, \sigma_u^2 +\sigma_{e_d}^2)\), \(\hat{P}^{DIR}_d \mid u_d \sim N(\boldsymbol{x}^{T}_{d}\boldsymbol \beta + u_d,\sigma_{e_d}^2)\), and \(u_d \sim N(0, \sigma^2_u)\).

Next, prior distributions are assumed for \(\boldsymbol{\beta}\) and \(\sigma^2_u\):

\[ \begin{align*} \beta_p & \sim N(0, 10000)\\ \sigma^2_u &\sim IG(0.0001, 0.0001) \end{align*} \]

Therefore, the Bayesian estimator for \(P_d\) is given as \(\tilde{P}_d = E\left(P_d\mid\hat{P}_d^{DIR}\right)\).

Optimal Predictor of \(P_d\)

The optimal predictor of \(P_d\) is given by:

\[E(P_d | \hat{P}^{DIR}_d) = \gamma_d\hat{P}^{DIR}_d + (1-\gamma_d)\boldsymbol{x}^{T}_{d}\boldsymbol \beta\]

where \(\gamma_d = \frac{\sigma_u^2}{\sigma_u^2 +\sigma_{e_d}^2}\).

We know that \(\hat{P}^{DIR}_d \sim N(\boldsymbol{x}^{T}_{d}\boldsymbol \beta, \sigma_u^2 +\sigma_{e_d}^2)\), \(\hat{P}^{DIR}_d \mid u_d \sim N(\boldsymbol{x}^{T}_{d}\boldsymbol \beta + u_d,\sigma_{e_d}^2)\), and \(u_d \sim N(0, \sigma^2_u)\).

Therefore, the optimal predictor is determined by a weighted combination of the direct estimator \(\hat{P}^{DIR}_d\) and the linear predictor \(\boldsymbol{x}^{T}_{d}\boldsymbol \beta\) where the weights are determined by the ratio of the variance components, providing an optimal balance between the direct estimate and the model-based prediction.

\[ \begin{align*} f(u_d| \hat{P}^{DIR}_d) \propto f(\hat{P}^{DIR}_d | u_d)f(u_d) & = \frac{1}{\sigma^2_{e_d}\sqrt{2\pi}}\exp\left\{-\frac{1}{2\sigma^2_{e_d}(\hat{P}^{DIR}_d-\boldsymbol{x}^{T}_{d}\boldsymbol \beta - u_d)^2}\right\} \frac{1}{\sigma^2_u\sqrt{2\pi}}\exp\left\{- \frac{1}{2\sigma^2_u}u_d^2\right\}\\ & \propto \exp\left\{-\frac{u_d^2 - 2u_d(\hat{P}^{DIR}_d-\boldsymbol{x}^{T}_{d}\boldsymbol \beta)}{2\sigma^2_{e_d}} - \frac{u_d^2}{2\sigma^2_u}\right\} \\ & = \exp\left\{-\frac{1}{2}\left[(\frac{1}{\sigma^2_{e_d}} + \frac{1}{\sigma^2_u})u_d^2 - 2\frac{\hat{P}^{DIR}_d-\boldsymbol{x}^{T}_{d}\boldsymbol \beta}{\sigma_{e_d}^2}u_d\right] \right\} \\ & = \exp \left\{ -\frac{1}{2\frac{\sigma_u^2\sigma_{e_d}^2}{\sigma_u^2 +\sigma_{e_d}^2}}\left[u_d^2 - 2\frac{\sigma_u^2}{\sigma_u^2 +\sigma_{e_d}^2}(\hat{P}^{DIR}_d-\boldsymbol{x}^{T}_{d}\boldsymbol \beta)u_d \right] \right\} \\ & \propto \exp \left\{ -\frac{1}{2\frac{\sigma_u^2\sigma_{e_d}^2}{\sigma_u^2 +\sigma_{e_d}^2}}\left[u_d - \frac{\sigma_u^2}{\sigma_u^2 +\sigma_{e_d}^2}(\hat{P}^{DIR}_d-\boldsymbol{x}^{T}_{d}\boldsymbol \beta)\right]^2 \right\} \\ & \propto N(E(u_d|\hat{P}^{DIR}_d), \text{Var}(u_d|P^{DIR})) \end{align*} \]

with \(E(u_d|\hat{P}^{DIR}_d) = \frac{\sigma_u^2}{\sigma_u^2 +\sigma_{e_d}^2}(\hat{P}^{DIR}_d-\boldsymbol{x}^{T}_{d}\boldsymbol \beta)\) y \(\text{Var}(u_d|P^{DIR}) = \frac{\sigma_u^2\sigma_{e_d}^2}{\sigma_u^2 +\sigma_{e_d}^2}\). Therefore you have,

\[ \begin{align*} E(P_d | \hat{P}^{DIR}_d) = \boldsymbol{x}^{T}_{d}\boldsymbol \beta + E(u_d|\hat{P}^{DIR}_d) & = \boldsymbol{x}^{T}_{d}\boldsymbol \beta + \frac{\sigma_u^2}{\sigma_u^2 +\sigma_{e_d}^2}(\hat{P}^{DIR}_d-\boldsymbol{x}^{T}_{d}\boldsymbol \beta) \\ & = \frac{\sigma_{e_d}^2}{\sigma_u^2 +\sigma_{e_d}^2}\hat{P}^{DIR}_d + \frac{\sigma_u^2}{\sigma_u^2 +\sigma_{e_d}^2}\boldsymbol{x}^{T}_{d}\boldsymbol \beta \\ & = \gamma_d\hat{P}^{DIR}_d + (1-\gamma_d)\boldsymbol{x}^{T}_{d}\boldsymbol \beta \end{align*} \]