-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add low memory implementation of core computation #74
base: master
Are you sure you want to change the base?
Conversation
- use cython to do matrix multiplication and column sum (cython dependency) - avoid storing intemediate matrix - use scipy blas functions (add scipy dependency) - expose cython version as `low_memory` option - change memory options in `random_forest_error` interface - add tests for cython implementation - first cython test failing but appears correct (from pytest logs)
8 core machine benchmark
(* It seems that the cython implementation is perhaps not good enough (at least performance seems quite variable on size, memory, numpy acceleration etc.). I would be interested to hear your thoughts. Can we speed up cython further? Can we avoid the large memory problem without chunking? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, and I think that your benchmarks make a good argument for including it. I believe that the tests are currently failing because of precision issues. Could you please change that comparison to a assert_amost_equal
?
forestci/tests/test_forestci.py
Outdated
b = np.arange(1,13,dtype=np.float64).reshape(4,3) | ||
c = fci._cycore_computation(a, b) | ||
actual = fci._core_computation(np.zeros((2,10)), np.zeros((4,10)), a, b, 3) | ||
npt.assert_equal(actual, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can use assert_almost_equal
@owlas with your table and the benchmarks - can you set the current option |
@arokem Are you happy with the two (quite heavy) extra dependencies: cython, scipy? |
Co-Authored-By: Ariel Rokem <[email protected]>
Thanks for updating @owlas! The scipy dependency is not an issue. But could we make the cython dependency optional? That is, if the user has cython and can build the extension, then they can use the benefit of the speedup. Otherwise, it falls back to use the non-cythonized version. For an example from another project on how to set that up, see here: https://github.com/nipy/nitime/blob/master/setup.py#L51-L62 And here: https://github.com/nipy/nitime/blob/master/nitime/utils.py#L883-L934 We can also retire our Python 2.7 bot, which is failing because it's time to stop using Python 2. I can do that. |
Could you also please rebase on master? The failing CI run should go away when you do that. |
@owlas or anyone: interested in revamping this PR with current environment (6 years later!)? Thanks! |
Computing confidence intervals on large datasets is extremely memory intensive because of the following matrix multiplication: inbag (n_train_samples, n_trees) X pred_centered.T (n_trees, n_test_samples) = result (n_train_samples, n_test_samples).
I've added the low_memory option, which avoids storing this result matrix by performing the column sum for each dot product in the matrix multiplication. Adds dependencies on scipy and cython however.
low_memory
optionrandom_forest_error
interfacePerformance:
The cython implementation (
low_memory=True
) is always slower than the numpy implementation when theinbag.dot(pred_centered.T)
fits in memory (numpy is amazing). As the problem size increases, the numpy advantage increases further.The cython implementation becomes useful when the matrix product is larger than memory. In that case, the performance hit (at least in the following example) is less than the time taken to allocate and reallocate chunked memory.
For 7,000,000 training samples, 10,000 test samples, and a forest of 100 estimators (and 32 core machine):