Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional caching of CI test values in constraint-based causal discovery #122

Open
adam2392 opened this issue Mar 16, 2023 · 0 comments
Labels
good first issue Good for newcomers

Comments

@adam2392
Copy link
Collaborator

adam2392 commented Mar 16, 2023

Similar to causal-learn and in the quest to achieve feature parity to make sure we're converging to a best-of-both implementations, we want to add aching of CI test values as a base for all skeleton learning algorithms. Starting an issue to track doing this...

We want to cache the explicit pvalues to allow users to re-run the entire algorithm using a set of different alpha values. Moreover, if they want to re-run the algorithm, it would be trivial to do so.

Implementation Thoughts

Caching can be implemented as a function of joblib. We want caching to be a function of the dataset, so we would first compute a hash of the dataset, which is used as a folder location: location of cache = '.dodiscover/<dataset_hash>'. So the cache would save to a private folder (similar to what many packages do) and then we can easily clear the cache using joblib API.

Then as a function of x_var, y_var, conditioning_set, and conditioning_test, we would let joblib.Memory cache the pvalues for us. However, another problem we need to figure out is how to best parallelize the existing CI tests using joblib.Parallel. This is actually not super trivial when I looked at it.

Assuming we can implement the parallelization, then the joblib caching would come for free almost and they would work well together without us having to right any of the "file saving and file opening" code. This is all abstracted.

xref: https://joblib.readthedocs.io/en/latest/auto_examples/nested_parallel_memory.html#sphx-glr-auto-examples-nested-parallel-memory-py

cc: @jaron-lee

@adam2392 adam2392 added the good first issue Good for newcomers label Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant