Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining quality #205

Open
devinrsmith opened this issue Dec 14, 2022 · 3 comments
Open

Determining quality #205

devinrsmith opened this issue Dec 14, 2022 · 3 comments

Comments

@devinrsmith
Copy link

After updating from 3.2 to 3.3, some of our tests with hardcoded "quality" checks were failing. I captured my notes and speculated a bit in deephaven/deephaven-core#3204. I'm sure there is a lot of nuance to determining "accuracy and quality", but I'm wondering if there is any general guidance on how one should think / measure quality wrt t-digests.

Maybe related to the note on the README:

describe accuracy using the quality suite

Potential related to #85

It may be that our test is not indicative of "real life" usage.

I'm sure it's a balancing act, trying to optimize t-digest in general may make some specific datasets perform worse.

@tdunning
Copy link
Owner

tdunning commented Dec 15, 2022 via email

@devinrsmith
Copy link
Author

The test that passes in 3.2 and fails in 3.3 is essentially a (seeded) uniform random distribution of 10000 doubles in the range [-10000.0, 10000.0]. We check abs((p_x-t_x)/p_x) < 0.005 for x in [75, 99, 99.9] (p_x is percentile, t_x is t-digest with compression 100). (The test also does the same checks on 4 non-overlapping subsets that in union equal the original distribution.)

Upon seeing these fail in 3.3, I added a single root-mean-square error calculation that accumulates the errors across this distribution, as well as re-seeding the test w/ 1000 distinct seeds. A bit hand-wavy, but in the end the calculation produces

3.2 RMSE = 0.00090
3.3 RMSE = 0.00127

I suspect I might find

upping the default value to 200 had the desired effect (same memory use, improved accuracy)

, so I will look into that.

I think another factor is I don't have a good idea of what "compression" is, or how to think about how developers may need to tweak compression from release to release. The javadocs say "100 is a common value for normal uses".

I would have expected a given compression X to be "equivalent" from release to release in one of these two dimensions:

  1. compression X achieves the same memory usage (and hopefully improved "quality")
  2. compression X achieves the same "quality" (and hopefully improved memory usage)

But it seems like compression is a more nuanced value? (ie, based on your statements and my findings, X=100 uses less memory, but also less "quality" on 3.3, so it doesn't follow one of the two dimensions above).

I wonder if it would be useful to have user-level abstractions that could be exposed as the two dimensions above?

Ie,

    public static double qualityToCompression(double quality) {
        return quality * QUALITY_TO_COMPRESS_FACTOR; // maybe it's not a constant factor, but more complex function
    }
    
    public static double memoryToCompression(double memory) {
        return quality * MEMORY_TO_COMPRESS_FACTOR; // maybe it's not a constant factor, but more complex function
    }

Then user-code would be able to try and lock-in on the mode they are trying to optimize for:

TDigest tDigestForTesting = createDigest(qualityToCompression(100.0));
...
TDigest tDigestForProduction = createDigest(memoryToCompression(...));
...

This is a bit long winded... but I very much appreciate your response, and will get back after I analyze the memory usage a bit more.

@tdunning
Copy link
Owner

tdunning commented Dec 23, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants