Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Pareto k-values for SIS? #227

Open
fweber144 opened this issue Aug 1, 2023 · 10 comments
Open

Inconsistent Pareto k-values for SIS? #227

fweber144 opened this issue Aug 1, 2023 · 10 comments

Comments

@fweber144
Copy link
Contributor

I noticed that there might be an inconsistency in the Pareto k-values between different approaches for standard importance sampling (SIS):

library(loo)
log_ratios <- -1 * example_loglik_array()
log_ratios <- log_ratios[1:3, , ]
r_eff <- relative_eff(exp(-log_ratios))

# Call psis():
psis_result <- psis(log_ratios, r_eff = r_eff)
# In fact, SIS was used (due to the small number of draws):
lw_sis <- apply(log_ratios, 3, as.vector)
lw_sis <- sweep(lw_sis, 2, apply(lw_sis, 2, matrixStats::logSumExp))
stopifnot(all.equal(weights(psis_result), lw_sis,
                    tolerance = .Machine$double.eps))

# Now request SIS explicitly:
sis_result <- sis(log_ratios, r_eff = r_eff)
# The (log) weights are as expected:
stopifnot(all.equal(weights(sis_result), lw_sis,
                    tolerance = .Machine$double.eps))

# However:
table(pareto_k_values(psis_result))
## Inf
##  32
table(pareto_k_values(sis_result))
##  0
## 32

The point is that calling psis() with a small number of draws will cause the Pareto smoothing not to take place. Instead, SIS is used, as demonstrated above. In that case, the Pareto k-values are Inf. When using sis() explicitly, the Pareto k-values are 0.

Background: In projpred, it is possible (although not encouraged and in particular, this is not the default behavior) to use PSIS-LOO CV with the search being excluded from the CV (validate_search = FALSE) and a small number of thinned draws. In principle, projpred could use sis() explicitly in such a case (and then either continue with the Pareto k-values which are all 0 or even skip the Pareto k checks), but that requires to catch the "small S" case manually (which is not a problem, but if loo changes anything in its "small S" decision rule in the future, this would require adapting projpred's decision rule analogously). Using psis() would be more straightforward, but then we have Pareto k-values which are Inf, which would trigger warnings in the Pareto k checks.

@jgabry
Copy link
Member

jgabry commented Aug 1, 2023

Hmm yeah that inconsistency is unfortunate. I’m not sure what to do about it. Any ideas?

One thought: in hindsight it would actually probably make more sense for the Pareto k values to be NA and not 0 or Inf in all cases when Pareto smoothing isn’t done. But doing that (or changing between 0 and Inf) at this point wouldn’t be backwards compatible for users who were extracting and using the stored k values in the psis object (that’s certainly a minority of users though).

@fweber144
Copy link
Contributor Author

fweber144 commented Aug 2, 2023

Yeah, the backwards compatibility is definitely an issue. Perhaps the new behavior could be implemented conditional on a global option which needs to be set to TRUE explicitly. When doing this, the current behavior could also be deprecated, so that in some later release, only the new behavior is available. For implementing the new behavior, #137 might be helpful.

In any case, since I wouldn't want projpred to require the most recent loo version where that new behavior is optionally available, I will try to find some other solution for projpred (either catching the "small S" case manually or perhaps using capture.output() to get the warnings from psis() and then checking for the "small S" case there). So this issue here isn't too urgent from my side.

@fweber144
Copy link
Contributor Author

For projpred, I guess checking whether all Pareto k-values are Inf (and regarding them as something like NA) is unsafe as well, right? (I don't know if Pareto smoothing could result in all-Inf Pareto k-values as well.)

@jgabry
Copy link
Member

jgabry commented Aug 2, 2023

For projpred, I guess checking whether all Pareto k-values are Inf (and regarding them as something like NA) is unsafe as well, right? (I don't know if Pareto smoothing could result in all-Inf Pareto k-values as well.)

I think that would almost always be ok, but there could be annoying edge cases where that would be unsafe.

@fweber144
Copy link
Contributor Author

Yes, I see. Thank you!

@jgabry
Copy link
Member

jgabry commented Aug 2, 2023

(btw yesterday I added you to the r-packages team in the stan-dev GitHub organization. I should have done that a while ago!)

@jgabry
Copy link
Member

jgabry commented Aug 2, 2023

Perhaps the new behavior could be implemented conditional on a global option which needs to be set to TRUE explicitly. When doing this, the current behavior could also be deprecated, so that in some later release, only the new behavior is available. For implementing the new behavior, #137 might be helpful.

Yeah that’s definitely a possibility.

Another possibility is to just make the change (breaking backwards compatibility) and call it loo v3.0. I’d be more ok with that if we do a major version number increase. But usually a major version release has major new functionality or other big changes that we don’t really have at the moment.

@fweber144
Copy link
Contributor Author

(btw yesterday I added you to the r-packages team in the stan-dev GitHub organization. I should have done that a while ago!)

Yes, I saw that. Thank you very much!

Another possibility is to just make the change (breaking backwards compatibility) and call it loo v3.0. I’d be more ok with that if we do a major version number increase. But usually a major version release has major new functionality or other big changes that we don’t really have at the moment.

Yes, I agree that this would not be worth a new major version (3.x.x).

@fweber144
Copy link
Contributor Author

Btw, I just realized that the capture.output() solution is probably not a good one, because testthat::test_that() for example somehow manages to redirect warning messages (probably so that it can throw them itself) so they are not captured correctly by capture.output(). And if testthat::test_that() is able to do that, I guess other functions/packages can do that, too.

@avehtari
Copy link
Collaborator

The point is that calling psis() with a small number of draws will cause the Pareto smoothing not to take place. Instead, SIS is used, as demonstrated above. In that case, the Pareto k-values are Inf. When using sis() explicitly, the Pareto k-values are 0.

I don't understand how they can be different as the diagnostic is done before smoothing and thus whether smoothing is done or not should not affect the diagnostic values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants