Add low memory implementation of core computation #74

owlas · 2018-03-06T18:14:36Z

Computing confidence intervals on large datasets is extremely memory intensive because of the following matrix multiplication: inbag (n_train_samples, n_trees) X pred_centered.T (n_trees, n_test_samples) = result (n_train_samples, n_test_samples).

I've added the low_memory option, which avoids storing this result matrix by performing the column sum for each dot product in the matrix multiplication. Adds dependencies on scipy and cython however.

use cython to do matrix multiplication and column sum (cython dependency)
avoid storing intemediate matrix
use scipy blas functions (add scipy dependency)
expose cython version as low_memory option
change memory options in random_forest_error interface
add tests for cython implementation
first cython test failing but appears correct (from pytest logs)

Performance:

The cython implementation (low_memory=True) is always slower than the numpy implementation when the inbag.dot(pred_centered.T) fits in memory (numpy is amazing). As the problem size increases, the numpy advantage increases further.

The cython implementation becomes useful when the matrix product is larger than memory. In that case, the performance hit (at least in the following example) is less than the time taken to allocate and reallocate chunked memory.

For 7,000,000 training samples, 10,000 test samples, and a forest of 100 estimators (and 32 core machine):

numpy: 27min
cython: 15min

- use cython to do matrix multiplication and column sum (cython dependency) - avoid storing intemediate matrix - use scipy blas functions (add scipy dependency) - expose cython version as `low_memory` option - change memory options in `random_forest_error` interface - add tests for cython implementation - first cython test failing but appears correct (from pytest logs)

owlas · 2018-03-06T18:15:13Z

8 core machine benchmark

Samples	Numpy time	cython time	numpy memory	cython memory
5,000	322ms	371ms	76Mb	41Mb
50,000	12s	34s	4Gb	0.4Gb
500,000	20min	80min	6.7Gb*	0.4Gb

(* memory_limit=3000 - 3Gb)

It seems that the cython implementation is perhaps not good enough (at least performance seems quite variable on size, memory, numpy acceleration etc.). I would be interested to hear your thoughts. Can we speed up cython further? Can we avoid the large memory problem without chunking?

arokem

This looks great to me, and I think that your benchmarks make a good argument for including it. I believe that the tests are currently failing because of precision issues. Could you please change that comparison to a assert_amost_equal?

arokem · 2018-03-08T19:43:00Z

forestci/tests/test_forestci.py

+    b = np.arange(1,13,dtype=np.float64).reshape(4,3)
+    c = fci._cycore_computation(a, b)
+    actual = fci._core_computation(np.zeros((2,10)), np.zeros((4,10)), a, b, 3)
+    npt.assert_equal(actual, c)


I think that you can use assert_almost_equal

adamwlev · 2018-03-12T17:10:23Z

@owlas with your table and the benchmarks - can you set the current option memory_limit to 400, for the 50,000 and 500,000 samples rows so that we can compare the currently implemented chunking to the cython implementation when the memory is the same?

AlJohri · 2019-08-01T02:31:25Z

hi @owlas @arokem @adamwlev, just bumping this PR. any chance we can get this merged?

forestci/tests/test_forestci.py

owlas · 2019-08-04T14:32:24Z

@arokem Are you happy with the two (quite heavy) extra dependencies: cython, scipy?

Co-Authored-By: Ariel Rokem <[email protected]>

arokem · 2019-08-09T02:59:15Z

Thanks for updating @owlas!

The scipy dependency is not an issue. But could we make the cython dependency optional? That is, if the user has cython and can build the extension, then they can use the benefit of the speedup. Otherwise, it falls back to use the non-cythonized version.

For an example from another project on how to set that up, see here:

https://github.com/nipy/nitime/blob/master/setup.py#L51-L62

And here:

https://github.com/nipy/nitime/blob/master/nitime/utils.py#L883-L934

We can also retire our Python 2.7 bot, which is failing because it's time to stop using Python 2. I can do that.

arokem · 2019-08-09T03:27:16Z

Could you also please rebase on master? The failing CI run should go away when you do that.

danieleongari · 2024-07-09T18:59:59Z

@owlas or anyone: interested in revamping this PR with current environment (6 years later!)? Thanks!

arokem requested changes Mar 8, 2018

View reviewed changes

arokem reviewed Aug 1, 2019

View reviewed changes

forestci/tests/test_forestci.py Outdated Show resolved Hide resolved

AlJohri mentioned this pull request Aug 5, 2019

progress indicator? #81

Open

Update forestci/tests/test_forestci.py

eb51040

Co-Authored-By: Ariel Rokem <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add low memory implementation of core computation #74

Add low memory implementation of core computation #74

owlas commented Mar 6, 2018 •

edited

Loading

owlas commented Mar 6, 2018 •

edited

Loading

arokem left a comment

arokem Mar 8, 2018

adamwlev commented Mar 12, 2018

AlJohri commented Aug 1, 2019

owlas commented Aug 4, 2019

arokem commented Aug 9, 2019

arokem commented Aug 9, 2019

danieleongari commented Jul 9, 2024

Add low memory implementation of core computation #74

Are you sure you want to change the base?

Add low memory implementation of core computation #74

Conversation

owlas commented Mar 6, 2018 • edited Loading

owlas commented Mar 6, 2018 • edited Loading

arokem left a comment

Choose a reason for hiding this comment

arokem Mar 8, 2018

Choose a reason for hiding this comment

adamwlev commented Mar 12, 2018

AlJohri commented Aug 1, 2019

owlas commented Aug 4, 2019

arokem commented Aug 9, 2019

arokem commented Aug 9, 2019

danieleongari commented Jul 9, 2024

owlas commented Mar 6, 2018 •

edited

Loading

owlas commented Mar 6, 2018 •

edited

Loading