Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projection for discrete exponential dispersion (DED) families #361

Open
fweber144 opened this issue Oct 23, 2022 · 5 comments
Open

Projection for discrete exponential dispersion (DED) families #361

fweber144 opened this issue Oct 23, 2022 · 5 comments
Labels
enhancement Enhancements of existing features, but also new feature requests.

Comments

@fweber144
Copy link
Collaborator

fweber144 commented Oct 23, 2022

The families currently supported by projpred (gaussian(), binomial(), brms::bernoulli(), poisson()) belong to the class of "exponential dispersion (ED) families" (Jørgensen, 1987; see also section "Remarks" below) used in GLMs (see McCullagh and Nelder, 1989, chapter 2, equation (2.4) at the beginning of section 2.2.2, p. 28) (and also used in GLMMs, GAMs, and GAMMs). As shown by Piironen et al. (2020, section 3.4), the projection onto an ED-family submodel can be performed quite easily by fitting to the fit of the reference model, at least when regarding only the projection onto the location-specific parameter vector and disregarding the projection onto the dispersion parameter.

Jørgensen (1987) also presents the class of "discrete exponential dispersion families" (hereafter abbreviated by "DED families"). For example, the negative binomial distribution is a DED family. DED families are closely related to ED families (Jørgensen, 1987), but strictly speaking, the class of DED families is not a subset of the class of ED families, as can be seen from comparing equations (1.1) and (2.12) of Jørgensen (1987). However, the projection of a reference model onto the location-specific parameters of a DED-family submodel can be performed just as easily as for an ED-family submodel because for DED families, equation (12) of Piironen et al. (2020) simplifies to

$\theta_{c} = \mathop{\mathrm{argmax}}\limits_{\theta \in \Theta} \sum\limits_{i = 1}^{N} \left((\mu^{*}_{c, i} \, \mu_i(\dot{\theta}) - K(\phi) B(\mu_i(\dot{\theta}))) + \mathbb{E}(H(\tilde{y}_i, \phi)|I_c)\right)$

where I have adapted the notation a bit:

  • $\theta_{c}$ corresponds to $\theta_{\perp}$ from Piironen et al. (2020).
  • I have split up $\theta$ into the location-specific parameter vector $\dot{\theta} \in \dot{\Theta}$ and the dispersion parameter $\phi \in (0, \infty)$, i.e., $\theta = (\dot{\theta}^{\scriptscriptstyle \mathsf{T}}, \phi)^{\scriptscriptstyle \mathsf{T}}$ and accordingly $\Theta = \{\theta = (\dot{\theta}^{\scriptscriptstyle \mathsf{T}}, \phi)^{\scriptscriptstyle \mathsf{T}}: \dot{\theta} \in \dot{\Theta}, \phi \in (0, \infty)\}$.
  • $N$ corresponds to $n$ from Piironen et al. (2020). (I picked $N$ to agree with the notation from projpred's documentation.)
  • $\mu^{*}_{c, i}$ corresponds to $\mu^{*}_i$ from Piironen et al. (2020).
  • $\mu_i(\cdot)$ corresponds to $\eta_i(\cdot)$ from Piironen et al. (2020). (I picked $\mu$ instead of $\eta$ to avoid confusion with the linear or latent predictor.)
  • $\mathbb{E}(\cdot|I_c)$ corresponds to $\mathbb{E}_{\tilde{y}_i | I_c}(\cdot)$ from Piironen et al. (2020). (The GitHub Markdown math mode has some issues with the latter notation when used above.)
  • $K(\phi)$ corresponds to $\lambda$ from Jørgensen (1987). (I chose this functional notation in analogy to $A(\phi)$ from Piironen et al. (2020).)

When calculating the gradient of the right-hand side expression above (which is to be maximized), one can see that the term $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ is unimportant for the maximization with respect to the location-specific parameter vector $\dot{\theta}$ and so the projection solution with respect to $\dot{\theta}$ is obtained by a maximum-likelihood fit to the fit of the reference model, analogously to what Piironen et al. (2020, section 3.4) derived for ED families. For example, in case of a negative binomial submodel without multilevel and without additive terms, MASS::glm.nb() could be used for calculating the projection solution with respect to $\dot{\theta}$.

For the dispersion parameter, things get more complicated: The projection solution for $\phi$ is given by

$\phi_{c} = \mathop{\mathrm{argmax}}\limits_{\phi \in (0, \infty)} \sum\limits_{i = 1}^{N} \left((\mu^{*}_{c, i} \, \mu_i(\dot{\theta}_c) - K(\phi) B(\mu_i(\dot{\theta}_c))) + \mathbb{E}(H(\tilde{y}_i, \phi)|I_c)\right)$

with $\dot{\theta}_c$ denoting the projection solution with respect to $\dot{\theta}$ calculated as described above (fitting to the fit of the reference model). For the negative binomial distribution, I haven't been able so far to simplify this $\phi_c$ expression to a more tractable one because $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ essentially consists of expectations of log-Gamma function values. But perhaps future work for the negative binomial family could continue here, possibly using an approximation for this more complicated $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ part.

Remarks

  1. Jørgensen (1987) himself only uses the term "exponential dispersion models", but the discussion for that article mentions the term "exponential dispersion (ED) families", so I think it should be fine to use the term "ED families" also in the context of projpred.
  2. With regard to the GLM definition from McCullagh and Nelder (1989, chapter 2) that is also cited by Piironen et al. (2020), it is not really correct to say that "GLMs have their observation model in the exponential family" (Piironen et al., 2020, section 3.5) because the class of exponential families is different from the class of ED families: For a known dispersion parameter, ED families are indeed (special, i.e., linear, see Jørgensen, 1987, section 2) exponential families, but in general, this is not the case if the dispersion parameter is unknown (McCullagh and Nelder, 1989, chapter 2). I admit, however, that the wording of McCullagh and Nelder (1989, chapter 2, beginning of section 2.2.2, p. 28) is mistakable. Others seem to have been confused by this as well, see, e.g., here and here.

References

Jørgensen, B. (1987). Exponential Dispersion Models. Journal of the Royal Statistical Society. Series B (Methodological), 49(2), 127–162.

McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (D. R. Cox, D. V. Hinkley, N. Reid, D. B. Rubin, & B. W. Silverman, Eds.; 2nd ed., Vol. 37). Chapman & Hall.

Piironen, J., Paasiniemi, M., & Vehtari, A. (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1), 2155–2197. https://doi.org/10.1214/20-EJS1711

EDIT: In remark 2, I emphasized that I'm referring to the GLM definition from McCullagh and Nelder (1989, chapter 2). In general, GLMs can have different definitions.

@fweber144 fweber144 added enhancement Enhancements of existing features, but also new feature requests. and removed feature labels Oct 26, 2022
@avehtari
Copy link
Collaborator

For the negative binomial distribution, I haven't been able so far to simplify this
expression to a more tractable

numerical solution would be fine, too

  1. It is not really correct to say that "GLMs have their observation model in the exponential family" (Piironen et al., 2020, section 3.5) because the class of exponential families is different from the class of ED families:

I think this depends on the definition of GLM. If I remember correctly, at some point, GLM was defined to mean this.

@fweber144
Copy link
Collaborator Author

numerical solution would be fine, too

Ok, I will have to think about a possible numerical solution when I get time for this. Currently, I would focus on the latent projection and the augmented-data projection first.

I think this depends on the definition of GLM.

Yes, sorry, I forgot to mention that. I was following Piironen et al. (2020) and assumed that GLMs should be defined as in McCullagh and Nelder (1989, chapter 2). I don't know if Google Books lets you inspect page 28 of McCullagh and Nelder (1989) (for me, it does) or if you have access to an e-book or a physical copy of it (I took a physical copy). Then on top of page 28, you'll see what I mean by "mistakable wording":

We assume that [...] has a distribution in the exponential family, taking the form [... (basically equation (14) of Piironen et al., 2020)] for some specific [...]. If $\phi$ is known, this is an exponential-family model [...]. It may or may not be a two-parameter exponential family if $\phi$ is unknown.

@avehtari
Copy link
Collaborator

I have McCullagh and Nelder (1989) and the definition is what I remembered. I'm now confused what is the problem in Piironen et al (2020).

@fweber144
Copy link
Collaborator Author

In general, equation (14) of Piironen et al. (2020) (i.e., equation (2.4) of McCullagh and Nelder, 1989) is not an exponential family. Practically, this is (currently) not relevant because the R families supported by projpred right now (gaussian(), binomial(), poisson()) are all exponential dispersion families and exponential families.

@fweber144
Copy link
Collaborator Author

Background of this: I wanted to formulate the projection mathematically and then was confused by the term "exponential family" used for equation (14) of Piironen et al. (2020).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements of existing features, but also new feature requests.
Projects
None yet
Development

No branches or pull requests

2 participants