Checking the normality and equal variance of standardized residuals is one of the most common routines when fitting linear models and, although some statisticians argue they are the less important assumptions to consider unless you focus on prediction, easily available software prompts anyone to either display all kinds of graphical inspections and null hypothesis tests.

Consider fitting the model \(\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}\) to \(n\) observations with \(p-1\) predictors. Asumming the error terms follow a multivariate normal distribution with mean vector \(\mathbf{0}\) and covariance matrix \(\sigma^2 \mathbf{I}\), we already know the covariance matrix of the residual vector \(\mathbf{e}\) is \(\sigma^2(\mathbf{I} - \mathbf{H})\), where \(\mathbf{H}\) is the influence matrix. If \(\sigma^2\) was known, confidence intervals could then be constructed because \(\mathbf{e}\) would also follows a multivariate normal distribution. Nonetheless, this situation is utopian because we actually need to estimate the error variance. In this case, we may think that the ith studentized residual, \(r_i = \frac{e_i}{\widehat\sigma \sqrt{1 - h_{ii}}}\), follows a t-distribution with \(n-p-1\) degrees of freedom but \(e_i\) and \(\widehat{\sigma}\) are not independent. At this point, we may appeal to asimptotic approximations to simply use the t-distribution or compute the externally studentized residuals, which employ the error variance estimate that excludes the ith case. If we however want to to use the whole data, Ellenberg (1973) derived the joint distribution of a kind of standardized residuals. We will follow his derivation to get instead the joint density of the internally studentized residuals and will supply R code to plot confidence intervals for raw residuals. Because the improvement over the externally studentized residuals may not be huge, this derivation should be taken as a matter of curiosity rather than of practical value.

Let \(\mathbf{M} = \mathbf{I} - \mathbf{H}\) so that \(\mathbf{e} = \mathbf{M} \boldsymbol{\epsilon}\). Consider a \(k\)-subset of residuals. The sum of squared residuals obtained by removing that subset from the regression is \[ S_k^2 = S^2 - \mathbf{e}_k^\top \mathbf{M}_{k\times k}^{-1} \mathbf{e}_k, \] where \(S^2\) is the sum of squared residuals from the model fitted to the full data, \(\mathbf{e}_k\) is the vector containing the subset of \(k\) residuals and \(\mathbf{M}_{k \times k}\) is the corresponding partition matrix of \(\mathbf{M}\). We can rewrite it as \[ \begin{aligned} S_k^2 &= \boldsymbol{\epsilon}^\top \mathbf{M} \boldsymbol{\epsilon} - \boldsymbol{\epsilon}^\top (\mathbf{MM^*M}) \boldsymbol{\epsilon} \\ &= \boldsymbol{\epsilon}^\top (\mathbf{M} - \mathbf{MM^*M}) \boldsymbol{\epsilon}, \end{aligned} \] where \(\mathbf{M^*}\) is a \(n \times n\) matrix with all the entries not related to the \(k\)-subset set to zero. This is a quadratic form where \(\mathbf{M}\) is idempotent and the rank of \((\mathbf{M} - \mathbf{MM^*M})\) is \(n-p-k\), so \[ \frac{S_k^2}{\sigma^2} \sim \chi^2 \Big(n-p-k \Big). \]

\(\mathbf{e}_k\) is independent of \(S_k^2\) so their joint probability density is \[ f(\mathbf{e}_k, S_k^2) = {\sf MVN} \Big(\mathbf{e}_k; \ \mathbf{0}, \sigma^2 \boldsymbol{\Sigma} \Big) \ \chi^2 \Big(S_k^2/\sigma^2; \ n-p-k \Big) / \sigma^2, \] where \(\boldsymbol{\Sigma} = \sigma^2 \mathbf{M}_{k \times k}\). Let’s just focus on the residual \(i\) from now on. Considering \(\widehat{\sigma} = \sqrt{S^2 / (n-k)}\), we apply a change of variables,

\[ \begin{aligned} r_i &= \frac{e_i}{\widehat{\sigma} \sqrt{m_{ii}}} \\ \widehat{\sigma}^2 &= \frac{S_i^2 + \widehat{\sigma}^2 r_i^2}{(n-p)} \end{aligned}, \]

Notice that by standardazing the residuals we ensure that \(\sigma^2 = 1\). The joint probability density now looks like \[ f(e_i, S_i^2) = \mathcal{N} \Big(r_i \widehat{\sigma} \sqrt{m_{ii}}; \ 0, m_{ii} \Big) \ \chi^2 \left(\widehat{\sigma}^2 (n-p-r_i^2); \ n-p-1 \right). \]

Denoting \(v = n-p\), the Jacobian of the transformation is \[ \begin{equation} \begin{vmatrix} \widehat{\sigma} \sqrt{m_{ii}} & r_i \sqrt{m_{ii}} \\ -2 \widehat{\sigma}^2r_i & 2 \widehat{\sigma}v - 2\widehat{\sigma} ri^2 \end{vmatrix} = 2\widehat{\sigma}^2v \sqrt{m_{ii}} \end{equation}. \]

Hence, \[ f(r_i, \widehat{\sigma}) = 2\widehat{\sigma}^2v \sqrt{m_{ii}} \ \mathcal{N} \Big(r_i \widehat{\sigma} \sqrt{m_{ii}}; \ 0, m_{ii} \Big) \ \chi^2 \Big(\widehat{\sigma}^2 (v-r_i^2); \ v-1 \Big). \]

Finally, simplifying and integrating over \(\widehat{\sigma}\) yields \[ f(r_i) = \frac{v^{1-v/2} (v-r_i^2)^{(v-3)/2} \Gamma \left(\frac{v}{2} \right)}{\Gamma \left(\frac{v-1}{2} \right) \sqrt{\pi}}, \quad \mid r_i \mid < \sqrt{v}. \]

Using \(\tt Mathematica\) we find \[ F(r_i) = \frac{v^{1-\frac{v}{2}} \Gamma \left(\frac{v}{2} \right)}{\Gamma \left(\frac{v-1}{2} \right) \sqrt{\pi}} r_i \left(1 - \frac{r_i^2}{v} \right)^{\frac{3-v}{2}} \Big(v-r_i^2 \Big)^{\frac{v-3}{2}} \\ _2F_1 \left( \frac{1}{2}, \frac{3-v}{2}, \frac{3}{2}, \frac{r^2}{v} \right) + C, \] where \(_2F_1\) is the hypergeometric function. A visual inspection of \(F(x)\) indicates that \(C = \frac{1}{2}\) and the calculation of p-values and confidence intervals is straightforward. We just need to find the \(1-\alpha/2\) percentile, multiplicate it by \(\widehat{\sigma}\sqrt{m_{ii}}\) and add \(e_i\).

The R function resid_ci takes the output of a lm class object and does all the work.

# Distribution function of r:
pr <- function(r, v) {
  if(abs(r) >= sqrt(v)) {
    return(0)
  }
  v^(1-v/2)*gamma(v/2)/(gamma((v-1)/2) * sqrt(pi))*
  r*(1-r^2/v)^((3-v)/2)*(v-r^2)^((v-3)/2)*
    Re(hypergeo::hypergeo(1/2, (3-v)/2, 3/2, r^2/v)) + 1/2
}
# Notice you need the 'hypergeo' package.

# Function to get and plot the confidence intervals:
resid_ci <- function(lm, confidence, plot = NULL) {
   alpha <- 1 - confidence
   hat_y <- modelo$fitted.values
   e <- modelo$residuals
   v <- modelo$df.residual
   hat_sigma2 <- sum(e^2) / v
   m <- 1 - lm.influence(modelo)$hat
   se <- sqrt(hat_sigma2 * m)
   r <- e / se
   # Bounds for the domain of r:
   u <- sqrt(v) - 0.01
   l <- -u
   critical_value <- uniroot(f = function(r, v, alpha) pr(r, v) - (1-alpha/2),
          v = v, alpha = alpha, lower = l, upper = u)$root
   CI <- sapply(c(-1, 1), FUN = function(x) e + x*critical_value * se)
   colnames(CI) <- c("lower limit", "upper limit")
   if(!is.null(plot)) {
      plot(hat_y[plot], e[plot], ylim = c(max(CI[plot, ]), min(CI[plot, ])), 
        ylab = "Residuals", xlab = "Fitted values")
      sapply(plot, FUN = function(x) segments(x0 = hat_y[x], x1 = hat_y[x], 
                                          y0 = CI[x, 1], y1 = CI[x, 2]))
      segments(x0 = min(hat_y[plot]), x1 = max(hat_y[plot]), y0 = 0, y1 = 0, lty = 2)
   }
   return(CI)
}

The plot argument can be used to plot a subset of residuals by providing its indexes.

set.seed(25022020)
# Set-up:
n <- 20
p <- 4
v <- n-p
X <- cbind(1, MASS::mvrnorm(n, mu = rep(0, p-1), Sigma = diag(p-1)))
beta <- c(1, 0.2, 0.5, 0.8)
sigma <- rchisq(1, 5)
# Simulation:
epsilon <- rnorm(n, 0, sigma)
y <- X %*% beta + epsilon
# Model:
modelo <- lm(y ~ 0 + X)
# Plotting the confidence intervals with alpha = .05:
resid_ci(modelo, confidence = .95, plot = 1:n)

##    lower limit upper limit
## 1   -19.464750   0.7951401
## 2   -17.696498   1.6501012
## 3    -5.550288  13.2235236
## 4   -15.831542   4.7842290
## 5    -6.212853  15.2809272
## 6   -10.152657  11.7342894
## 7   -14.961725   6.2331182
## 8    -5.813860  14.1868663
## 9   -16.738508   3.2498009
## 10  -13.733734   7.9669237
## 11   -8.529167  13.4177885
## 12  -10.779609   9.1548674
## 13  -12.121683   4.7811506
## 14   -6.199955  14.0267908
## 15    3.315743  23.4428353
## 16   -5.336643  15.7564900
## 17  -12.674718   8.8032118
## 18   -9.720376   9.6704875
## 19   -9.200522  12.9166644
## 20   -7.601659  13.9297983

And as we should expect in the long run under the homocedastic and normality assumptions, only one of the \(20\) residuals deviated from the model.

Reference

Ellenberg, J. H. (1973). The Joint Distribution of the Standardized Least Squares Residuals from a General Linear Regression. Journal of the American Statistical Association, 68(344), 941–943. https://doi.org/10.1080/01621459.1973.10481450

Confidence intervals for raw residuals

Reference