inst/pkgdown/includes/custom-head.html

Skip to contents

Latent Class Analysis

Latent Class Analysis assumes that people belong to different groups (i.e., classes) that are due to nonobservable characteristics. In a latent class model, we aim to estimate two kinds of probabilities:

  1. The probability that a person belongs to a particular class.
  2. The conditional probabilities of item responses if a person belongs to a given class.

The likelihood

Suppose that a sample of people respond to JJ items and 𝐲\mathbf{y} is a vector that contains the scores to each item jj. Let KK denote the number of latent classes and let xkx_k be class kk. Then, the likelihood of the response pattern 𝐲\mathbf{y}, if it was observed nn times in the sample and every person responds independently of each other, can be written as β„“=P(𝐲)n=(βˆ‘k=1KP(xk)P(𝐲∣xk))n. \ell \;=\; P(\mathbf{y})^{n} \;=\; \Bigg(\sum_{k=1}^{K} P(x_k)\, P(\mathbf{y}\mid x_k)\Bigg)^{n}.

Assuming local independence (responses within a person are independent conditional on class), we have P(𝐲∣xk)=∏j=1JP(yj∣xk), P(\mathbf{y}\mid x_k) \;=\; \prod_{j=1}^{J} P(y_j \mid x_k), where yjy_j denotes the score on item jj. Hence, β„“=(βˆ‘k=1KP(xk)∏j=1JP(yj∣xk))n, \ell \;=\; \Bigg(\sum_{k=1}^{K} P(x_k)\, \prod_{j=1}^{J} P(y_j \mid x_k)\Bigg)^{\!n}, and the log-likelihood becomes β„“β„“=nlog(βˆ‘k=1KP(xk)∏j=1JP(yj∣xk)). \ell\ell \;=\; n \log \Bigg(\sum_{k=1}^{K} P(x_k)\, \prod_{j=1}^{J} P(y_j \mid x_k)\Bigg). The term inside the parenthesis is the probability of a single pattern, P(𝐲)P(\mathbf{y}). Assuming independence between people with different response patterns, the log-likelihood of the whole sample is the sum of log-likelihoods of each response pattern.

To simplify the computation of the logarithm likelihood and related derivatives, let lym∣xg=logP(ym∣xg)l_{y_m \mid x_g} = \log P(y_m \mid x_g), so that

β„“β„“=nlog(βˆ‘k=1KP(xk)exp(βˆ‘j=1Jlym∣xg)). \ell\ell \;=\; n \log \Bigg(\sum_{k=1}^{K} P(x_k)\, \exp\bigg(\sum_{j=1}^{J} l_{y_m \mid x_g}\bigg)\Bigg).

First-order derivatives

For a fixed pattern 𝐲\mathbf{y}, define P(𝐲)=βˆ‘k=1KP(xk)∏j=1JP(yj∣xk)=βˆ‘k=1KP(xk)exp(βˆ‘j=1Jlym∣xg). \begin{aligned} P(\mathbf{y}) \;&=\; \sum_{k=1}^{K} P(x_k)\, \prod_{j=1}^{J} P(y_j \mid x_k) \\ &=\; \sum_{k=1}^{K} P(x_k)\, \exp\bigg(\sum_{j=1}^{J} l_{y_m \mid x_g}\bigg). \end{aligned} Then βˆ‚β„“β„“βˆ‚P(xg)=n1P(𝐲)∏j=1JP(yj∣xg). \frac{\partial \ell\ell}{\partial P(x_g)} \;=\; n \,\frac{1}{P(\mathbf{y})}\, \prod_{j=1}^{J} P(y_j \mid x_g). For a specific item mm and class gg, βˆ‚β„“β„“βˆ‚lym∣xg=n1P(𝐲)P(xg)∏j=1JP(yj∣xg). \frac{\partial \ell\ell}{\partial l_{y_m \mid x_g}} \;=\; n \,\frac{1}{P(\mathbf{y})}\, P(x_g)\, \prod_{j=1}^{J} P(y_j \mid x_g).

Notice that this last expression is just the posterior, P(xg∣ym)P(x_g \mid y_m), weighted by nn.

Second-order derivatives

For classes g,hg,h, βˆ‚2β„“β„“βˆ‚P(xg)βˆ‚P(xh)=βˆ’n1P(𝐲)2(∏j=1JP(yj∣xg))(∏j=1JP(yj∣xh))=βˆ’nP(xg∣y)P(xh∣y)P(xg)P(xh). \begin{aligned} \frac{\partial^2 \ell\ell}{\partial P(x_g)\,\partial P(x_h)} \;&=\; -\,n \,\frac{1}{P(\mathbf{y})^{2}} \, \Bigg(\prod_{j=1}^{J} P(y_j \mid x_g)\Bigg)\! \Bigg(\prod_{j=1}^{J} P(y_j \mid x_h)\Bigg) \\ &=\; -n \frac{P(x_g \mid y) P(x_h \mid y)}{P(x_g) P(x_h)}. \end{aligned}

For items m,nm,n and classes g,hg,h, βˆ‚2β„“β„“βˆ‚lym∣xgβˆ‚lyn∣xh={nP(xg∣y)(1βˆ’P(xg∣y)),if g=h,βˆ’nP(xg∣y)P(xh∣y),otherwise. \frac{\partial^2 \ell\ell}{\partial l_{y_m \mid x_g}\,\partial l_{y_n \mid x_h}} = \begin{cases} \,n \, P(x_g \mid y) \!\ \big(1-P(x_g \mid y)\big), & \text{if } \!\ g=h,\\[1.25em] -\,n \, P(x_g \mid y) \!\ P(x_h \mid y), & \text{otherwise.} \end{cases}

For the mixed second derivative, βˆ‚2β„“β„“βˆ‚P(xh)βˆ‚lym∣xg={1P(xg)nP(xg∣y)(1βˆ’P(xg∣y)),if g=h,βˆ’1P(xh)nP(xg∣y)P(xh∣y),otherwise. \frac{\partial^2 \ell\ell}{\partial P(x_h)\,\partial l_{y_m \mid x_g}} = \begin{cases} \frac{1}{P(x_g)} \, n \, P(x_g \mid y) \big(1-P(x_g \mid y)\big), & \text{if } g=h,\\[1.0em] -\, \frac{1}{P(x_h)} \, n \, P(x_g \mid y) P(x_h \mid y), & \text{otherwise.} \end{cases}

Collecting these terms gives the Hessian in block form: Hess(β„“β„“)=[βˆ‚2β„“β„“βˆ‚P(xg)βˆ‚P(xh)βˆ‚2β„“β„“βˆ‚P(xh)βˆ‚lym∣xgβˆ‚2β„“β„“βˆ‚lym∣xgβˆ‚P(xh)βˆ‚2β„“β„“βˆ‚lym∣xgβˆ‚lym∣xg]. \mathrm{Hess}(\ell\ell) \;=\; \begin{bmatrix} \displaystyle \frac{\partial^2 \ell\ell}{\partial P(x_g)\,\partial P(x_h)} & \displaystyle \frac{\partial^2 \ell\ell}{\partial P(x_h)\,\partial l_{y_m \mid x_g}} \\[0.8em] \displaystyle \frac{\partial^2 \ell\ell}{\partial l_{y_m \mid x_g}\,\partial P(x_h)} & \displaystyle \frac{\partial^2 \ell\ell}{\partial l_{y_m \mid x_g}\,\partial l_{y_m \mid x_g}} \end{bmatrix}\!.

Models for the conditional likelihoods

The conditional probabilities need to be parameterized with a likelihood. We consider a multinomial likelihood for categorical items and a Gaussian likelihood for continuous items.

Multinomial

For categorical items, let Ο€mk∣g\pi_{m_k \mid g} be the probability of scoring category kk on item mm if a subject belongs to class gg. Then P(ym∣xg)=Ο€mk∣g, P(y_m \mid x_g) \;=\; \pi_{m_k \mid g}, where kk is such that ym=ky_m = k. With this parameterization, βˆ‚lym∣xgβˆ‚Ο€nk∣h={1Ο€nk∣h,if ym=k,m=n,g=h,0,otherwise. \frac{\partial l_{y_m \mid x_g}}{\partial \pi_{n_k \mid h}} = \begin{cases} \frac{1}{\pi_{n_k \mid h}}, & \text{if } y_m = k,\ m=n,\ g=h,\\ 0, & \text{otherwise.} \end{cases} and βˆ‚2lyj∣xiβˆ‚Ο€mk∣gβˆ‚Ο€nl∣h={βˆ’1Ο€mj∣i2,if yj=k=l,y=m=n,i=g=h,0,otherwise. \frac{\partial^2 l_{y_j \mid x_i}}{\partial \pi_{m_k \mid g} \partial \pi_{n_l \mid h}} = \begin{cases} -\frac{1}{\pi_{m_j \mid i}^2}, & \text{if } y_j = k = l,\ y=m=n,\ i=g=h,\\ 0, & \text{otherwise.} \end{cases}

Consequently, the Hessian for each conditional parameter has the following block form: Hess(lym∣xg)=πžβˆ‚2lyj∣xiβˆ‚Ο€mk∣gβˆ‚Ο€nl∣h𝐞⊀, \mathrm{Hess}(l_{y_m \mid x_g}) \;=\; \mathbf{e} \; \displaystyle \frac{\partial^2 l_{y_j \mid x_i}}{\partial \pi_{m_k \mid g} \partial \pi_{n_l \mid h}} \; \mathbf{e}^\top, where 𝐞\mathbf{e} is a vector of zeroes with a 1 in the position corresponding to the parameter Ο€yj∣i\pi_{y_j \mid i}.

Notice that each conditional parameter lym∣xgl_{y_m \mid x_g} has a Hessian matrix.

Gaussian

For continuous items, let Ο†\varphi denote the normal density. Let ΞΌm∣g\mu_{m\mid g} and Οƒm∣g\sigma_{m\mid g} be the mean and standard deviation for item mm in class gg. Then P(ym∣xg)=Ο†(ym;ΞΌm∣g,Οƒm∣g). P(y_m \mid x_g) \;=\; \varphi\!\big(y_m;\, \mu_{m\mid g},\, \sigma_{m\mid g}\big).

First derivatives: βˆ‚lym∣xgβˆ‚ΞΌn∣h={ymβˆ’ΞΌm∣gΟƒm∣g2,if m=n,g=h,0,otherwise, \frac{\partial l_{y_m \mid x_g}}{\partial \mu_{n\mid h}} = \begin{cases} \dfrac{y_m - \mu_{m\mid g}}{\sigma_{m\mid g}^{2}}, & \text{if } m=n,\ g=h,\\[0.6em] 0, & \text{otherwise,} \end{cases}

βˆ‚lym∣xgβˆ‚Οƒn∣h={(ymβˆ’ΞΌm∣g)2βˆ’Οƒm∣g2Οƒm∣g3,if m=n,g=h,0,otherwise. \frac{\partial l_{y_m \mid x_g}}{\partial \sigma_{n\mid h}} = \begin{cases} \dfrac{(y_m-\mu_{m\mid g})^{2}-\sigma_{m\mid g}^{2}}{\sigma_{m\mid g}^{3}}, & \text{if } m=n,\ g=h,\\[0.6em] 0, & \text{otherwise.} \end{cases}

Second-order derivatives: βˆ‚2lym∣xgβˆ‚ΞΌm∣gβˆ‚ΞΌn∣h={βˆ’1Οƒm∣g2,if m=n,g=h,0,otherwise, \frac{\partial^{2} l_{y_m \mid x_g}}{\partial \mu_{m\mid g}\, \partial \mu_{n\mid h}} = \begin{cases} -\dfrac{1}{\sigma_{m\mid g}^{2}}, & \text{if } m=n,\ g=h,\\[0.6em] 0, & \text{otherwise,} \end{cases}

βˆ‚2lym∣xgβˆ‚Οƒm∣gβˆ‚Οƒn∣h={1Οƒm∣g2βˆ’3(ymβˆ’ΞΌm∣g)2Οƒm∣g4,if m=n,g=h,0,otherwise, \frac{\partial^{2} l_{y_m \mid x_g}}{\partial \sigma_{m\mid g}\, \partial \sigma_{n\mid h}} = \begin{cases} \dfrac{1}{\sigma_{m\mid g}^{2}} - \dfrac{3(y_m-\mu_{m\mid g})^{2}}{\sigma_{m\mid g}^{4}}, & \text{if } m=n,\ g=h,\\[1.0em] 0, & \text{otherwise,} \end{cases}

βˆ‚2lym∣xgβˆ‚ΞΌm∣gβˆ‚Οƒn∣h={βˆ’2(ymβˆ’ΞΌm∣g)Οƒm∣g3,if m=n,g=h,0,otherwise. \frac{\partial^{2} l_{y_m \mid x_g}}{\partial \mu_{m\mid g}\, \partial \sigma_{n\mid h}} = \begin{cases} -\dfrac{2(y_m-\mu_{m\mid g})}{\sigma_{m\mid g}^{3}}, & \text{if } m=n,\ g=h,\\[0.6em] 0, & \text{otherwise.} \end{cases}

Consequently, the Hessian for each conditional parameter has the following block form: Hess(lym∣xg)=[βˆ‚2lym∣xgβˆ‚ΞΌm∣gβˆ‚ΞΌn∣hβˆ‚2lym∣xgβˆ‚ΞΌm∣gβˆ‚Οƒn∣hβˆ‚2lym∣xgβˆ‚Οƒn∣hβˆ‚ΞΌm∣gβˆ‚2lym∣xgβˆ‚Οƒm∣gβˆ‚Οƒn∣h]. \mathrm{Hess}(l_{y_m \mid x_g}) \;=\; \begin{bmatrix} \displaystyle \frac{\partial^{2} l_{y_m \mid x_g}}{\partial \mu_{m \mid g}\,\partial \mu_{n \mid h}} & \displaystyle \frac{\partial^{2} l_{y_m \mid x_g}}{\partial \mu_{m \mid g}\,\partial \sigma_{n \mid h}} \\[0.6em] \displaystyle \frac{\partial^{2} l_{y_m \mid x_g}}{\partial \sigma_{n \mid h}\,\partial \mu_{m \mid g}} & \displaystyle \frac{\partial^{2} l_{y_m \mid x_g}}{\partial \sigma_{m \mid g}\,\partial \sigma_{n \mid h}} \end{bmatrix}.

Notice that each conditional parameter lym∣xgl_{y_m \mid x_g} has a Hessian matrix.

Model for the latent class probabilities

Probabilities of class membership are parameterized with the softmax transformation:

P(xg)=exp(ΞΈg)βˆ‘jexp(ΞΈj), P(x_g) = \frac{\exp(\theta_g)}{\sum_j \exp(\theta_j)}, where ΞΈg\theta_g is the log-scale parameter associated with class gg.

The jacobian of this transformation is given by

J=diag(P)βˆ’PP⊀. J = \mathrm{diag}(P) - PP^\top. Finally, the Hessian for each probability is

Hess(P(xg))=P(xg)((egβˆ’P)(egβˆ’P)βŠ€βˆ’J), \mathrm{Hess}\big(P(x_g)\big) \;=\; P(x_g)\bigg((e_gβˆ’P)(e_gβˆ’P)^\topβˆ’J\bigg), where ege_g is a vector of zeroes with a 11 in position gg.

Model for the conditional probabilities of the multinomial model

Probabilities of conditional responses are parameterized with the softmax transformation:

Ο€mk∣g=exp(Ξ·mk∣g)βˆ‘jexp(Ξ·mj∣g), \pi_{m_k \mid g} = \frac{\exp(\eta_{m_k \mid g})}{\sum_j \exp(\eta_{m_j \mid g})}, where Ξ·mk∣g\eta_{m_k \mid g} is the log-scale parameter associated with response kk to item mm in class gg.

The jacobian of this transformation is given by

Jm∣g=diag(Ο€m∣g)βˆ’Ο€m∣g(Ο€m∣g)⊀. J_{m \mid g} = \mathrm{diag}(\pi_{m \mid g}) - \pi_{m \mid g} (\pi_{m \mid g})^\top. Finally, the Hessian for each probability is

Hess(Ο€mk∣g)=Ο€m∣g((ekβˆ’Ο€m∣g)(ekβˆ’Ο€m∣g)βŠ€βˆ’J), \mathrm{Hess}\big( \pi_{m_k \mid g} \big) \;=\; \pi_{m \mid g}\bigg((e_kβˆ’\pi_{m \mid g})(e_kβˆ’\pi_{m \mid g})^\topβˆ’J\bigg), where eke_k is a vector of zeroes with a 11 in position kk.

Constant priors

For latent class probabilities

For conditional likelihoods

Multinomial

For the conditional probabilities modeled with a multinomial likelihood, we add the following term to the log-likelihood for each class gg:

Ξ»2=βˆ‘mβˆ‘kΟ€Μ‚mkβˆ‘gΞ±Klog(Ο€mk∣g), \lambda_2 = \sum_m \sum_k \hat{\pi}_{m_k} \sum_g \frac{\alpha}{K} \log (\pi_{m_k \mid g}),

Where Ο€Μ‚mk\hat{\pi}_{m_k} is the proportion of times category kk was selected in item mm.

The first-order derivatives are

βˆ‚Ξ»2βˆ‚Ο€mk∣g=βˆ‘mβˆ‘kΟ€Μ‚mkβˆ‘gΞ±K1Ο€mk∣g. \frac{\partial \lambda_2}{\partial \pi_{m_k \mid g}} = \sum_m \sum_k \hat{\pi}_{m_k} \sum_g \frac{\alpha}{K} \frac{1}{\pi_{m_k \mid g}}.

The second-order derivatives are

βˆ‚2Ξ»2βˆ‚Ο€mk∣gβˆ‚Ο€mk∣g=βˆ’βˆ‘mβˆ‘kΟ€Μ‚mkβˆ‘gΞ±K1Ο€mk∣g2. \frac{\partial^2 \lambda_2}{\partial \pi_{m_k \mid g} \; \partial \pi_{m_k \mid g}} = -\sum_m \sum_k \hat{\pi}_{m_k} \sum_g \frac{\alpha}{K} \frac{1}{\pi^2_{m_k \mid g}}.

Gaussian

For the conditional probabilities modeled with a gaussian likelihood, we add the following term to the log-likelihood for each class gg: Ξ»3=βˆ‘gK(βˆ’0.5Ξ±Klog(∏jΟƒj∣g2)βˆ’0.5Ξ±Kβˆ‘jΟƒΜ‚j2Οƒj∣g2)=βˆ‘gK(βˆ’Ξ±Kβˆ‘jsj∣gβˆ’0.5Ξ±Kβˆ‘jΟƒΜ‚j2Οƒj∣g2), \begin{aligned} \lambda_3 &= \sum_g^K \left( -0.5\frac{\alpha}{K} \log\bigg(\prod_j \sigma^2_{j\mid g}\bigg) - 0.5\frac{\alpha}{K} \sum_j \frac{\hat{\sigma}^2_j}{\sigma^2_{j \mid g}} \right) \\ &= \sum_g^K \left( -\frac{\alpha}{K} \sum_j s_{j\mid g} - 0.5\frac{\alpha}{K} \sum_j \frac{\hat{\sigma}^2_j}{\sigma^2_{j \mid g}} \right), \end{aligned}

where ΟƒΜ‚j2\hat{\sigma}^2_j is the variance of item jj.

The first-order derivatives are βˆ‚Ξ»3βˆ‚Οƒm∣g=βˆ‘gKβˆ’Ξ±KΟƒm∣g+Ξ±KΟƒm∣g3ΟƒΜ‚m∣g2=βˆ‘gKΞ±K(ΟƒΜ‚m∣g2Οƒm∣g3βˆ’1Οƒm∣g). \begin{aligned} \frac{\partial \lambda_3}{\partial \sigma_{m \mid g}} \;&=\; \sum_g^K -\frac{\alpha}{K \sigma_{m \mid g}} + \frac{\alpha}{K \sigma^3_{m \mid g}} \hat{\sigma}^2_{m \mid g}\\ &= \sum_g^K \frac{\alpha}{K} \Bigg(\frac{\hat{\sigma}^2_{m \mid g}}{\sigma^3_{m \mid g}} - \frac{1}{\sigma_{m \mid g}}\Bigg). \end{aligned}

The second-order derivatives are βˆ‚2Ξ»3βˆ‚Οƒm∣gβˆ‚Οƒm∣g=βˆ‘gKΞ±K(1Οƒm∣gβˆ’3ΟƒΜ‚m∣g2Οƒm∣g4). \frac{\partial^2 \lambda_3}{\partial \sigma_{m \mid g} \partial \sigma_{m \mid g}} \;=\; \sum_g^K \frac{\alpha}{K} \Bigg(\frac{1}{\sigma_{m \mid g}} - 3\frac{\hat{\sigma}^2_{m \mid g}}{\sigma^4_{m \mid g}}\Bigg).