Inconsistent Pareto k-values for SIS? #227

fweber144 · 2023-08-01T19:14:56Z

I noticed that there might be an inconsistency in the Pareto k-values between different approaches for standard importance sampling (SIS):

library(loo)
log_ratios <- -1 * example_loglik_array()
log_ratios <- log_ratios[1:3, , ]
r_eff <- relative_eff(exp(-log_ratios))

# Call psis():
psis_result <- psis(log_ratios, r_eff = r_eff)
# In fact, SIS was used (due to the small number of draws):
lw_sis <- apply(log_ratios, 3, as.vector)
lw_sis <- sweep(lw_sis, 2, apply(lw_sis, 2, matrixStats::logSumExp))
stopifnot(all.equal(weights(psis_result), lw_sis,
                    tolerance = .Machine$double.eps))

# Now request SIS explicitly:
sis_result <- sis(log_ratios, r_eff = r_eff)
# The (log) weights are as expected:
stopifnot(all.equal(weights(sis_result), lw_sis,
                    tolerance = .Machine$double.eps))

# However:
table(pareto_k_values(psis_result))
## Inf
##  32
table(pareto_k_values(sis_result))
##  0
## 32

The point is that calling psis() with a small number of draws will cause the Pareto smoothing not to take place. Instead, SIS is used, as demonstrated above. In that case, the Pareto k-values are Inf. When using sis() explicitly, the Pareto k-values are 0.

Background: In projpred, it is possible (although not encouraged and in particular, this is not the default behavior) to use PSIS-LOO CV with the search being excluded from the CV (validate_search = FALSE) and a small number of thinned draws. In principle, projpred could use sis() explicitly in such a case (and then either continue with the Pareto k-values which are all 0 or even skip the Pareto k checks), but that requires to catch the "small S" case manually (which is not a problem, but if loo changes anything in its "small S" decision rule in the future, this would require adapting projpred's decision rule analogously). Using psis() would be more straightforward, but then we have Pareto k-values which are Inf, which would trigger warnings in the Pareto k checks.

The text was updated successfully, but these errors were encountered:

jgabry · 2023-08-01T20:55:16Z

Hmm yeah that inconsistency is unfortunate. I’m not sure what to do about it. Any ideas?

One thought: in hindsight it would actually probably make more sense for the Pareto k values to be NA and not 0 or Inf in all cases when Pareto smoothing isn’t done. But doing that (or changing between 0 and Inf) at this point wouldn’t be backwards compatible for users who were extracting and using the stored k values in the psis object (that’s certainly a minority of users though).

fweber144 · 2023-08-02T05:50:27Z

Yeah, the backwards compatibility is definitely an issue. Perhaps the new behavior could be implemented conditional on a global option which needs to be set to TRUE explicitly. When doing this, the current behavior could also be deprecated, so that in some later release, only the new behavior is available. For implementing the new behavior, #137 might be helpful.

In any case, since I wouldn't want projpred to require the most recent loo version where that new behavior is optionally available, I will try to find some other solution for projpred (either catching the "small S" case manually or perhaps using capture.output() to get the warnings from psis() and then checking for the "small S" case there). So this issue here isn't too urgent from my side.

fweber144 · 2023-08-02T05:54:43Z

For projpred, I guess checking whether all Pareto k-values are Inf (and regarding them as something like NA) is unsafe as well, right? (I don't know if Pareto smoothing could result in all-Inf Pareto k-values as well.)

jgabry · 2023-08-02T17:02:38Z

For projpred, I guess checking whether all Pareto k-values are Inf (and regarding them as something like NA) is unsafe as well, right? (I don't know if Pareto smoothing could result in all-Inf Pareto k-values as well.)

I think that would almost always be ok, but there could be annoying edge cases where that would be unsafe.

fweber144 · 2023-08-02T18:51:06Z

Yes, I see. Thank you!

jgabry · 2023-08-02T21:03:39Z

(btw yesterday I added you to the r-packages team in the stan-dev GitHub organization. I should have done that a while ago!)

jgabry · 2023-08-02T21:07:31Z

Perhaps the new behavior could be implemented conditional on a global option which needs to be set to TRUE explicitly. When doing this, the current behavior could also be deprecated, so that in some later release, only the new behavior is available. For implementing the new behavior, #137 might be helpful.

Yeah that’s definitely a possibility.

Another possibility is to just make the change (breaking backwards compatibility) and call it loo v3.0. I’d be more ok with that if we do a major version number increase. But usually a major version release has major new functionality or other big changes that we don’t really have at the moment.

fweber144 · 2023-08-03T08:06:09Z

(btw yesterday I added you to the r-packages team in the stan-dev GitHub organization. I should have done that a while ago!)

Yes, I saw that. Thank you very much!

Another possibility is to just make the change (breaking backwards compatibility) and call it loo v3.0. I’d be more ok with that if we do a major version number increase. But usually a major version release has major new functionality or other big changes that we don’t really have at the moment.

Yes, I agree that this would not be worth a new major version (3.x.x).

fweber144 · 2023-08-03T08:16:10Z

Btw, I just realized that the capture.output() solution is probably not a good one, because testthat::test_that() for example somehow manages to redirect warning messages (probably so that it can throw them itself) so they are not captured correctly by capture.output(). And if testthat::test_that() is able to do that, I guess other functions/packages can do that, too.

avehtari · 2023-11-29T17:14:03Z

The point is that calling psis() with a small number of draws will cause the Pareto smoothing not to take place. Instead, SIS is used, as demonstrated above. In that case, the Pareto k-values are Inf. When using sis() explicitly, the Pareto k-values are 0.

I don't understand how they can be different as the diagnostic is done before smoothing and thus whether smoothing is done or not should not affect the diagnostic values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Pareto k-values for SIS? #227

Inconsistent Pareto k-values for SIS? #227

fweber144 commented Aug 1, 2023

jgabry commented Aug 1, 2023

fweber144 commented Aug 2, 2023 •

edited

Loading

fweber144 commented Aug 2, 2023

jgabry commented Aug 2, 2023

fweber144 commented Aug 2, 2023

jgabry commented Aug 2, 2023

jgabry commented Aug 2, 2023

fweber144 commented Aug 3, 2023

fweber144 commented Aug 3, 2023

avehtari commented Nov 29, 2023

Inconsistent Pareto k-values for SIS? #227

Inconsistent Pareto k-values for SIS? #227

Comments

fweber144 commented Aug 1, 2023

jgabry commented Aug 1, 2023

fweber144 commented Aug 2, 2023 • edited Loading

fweber144 commented Aug 2, 2023

jgabry commented Aug 2, 2023

fweber144 commented Aug 2, 2023

jgabry commented Aug 2, 2023

jgabry commented Aug 2, 2023

fweber144 commented Aug 3, 2023

fweber144 commented Aug 3, 2023

avehtari commented Nov 29, 2023

fweber144 commented Aug 2, 2023 •

edited

Loading