You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issues aims to discuss scaling of PyKridge to large datsets (which could impact, for instance, the optimization approaches in issue #35).
Here are approximate (and possibly inaccurate) time complexity estimations for different processing steps of the kriging process in 2D, according to these benchmarks (adapted from PR #36), applied to a 5k-10k dataset which only have 2 measurement points for each parameter,
Calculation (training) of the kriging model: ~O(N_train²)
Prediction from a trained model (no moving window):
all backends backend : ~O(N_test*N_train^1.5)
Prediction from a trained model (with window):
loop and C backends: ~O(N_test*N_nn^(1~2))
For information, the approximate time complexity of linear algebra operations that may limit the performance are,
(though the constant term would be quite different).
This may be of interest to @kvanlombeek and @basaks as discussed in issue #29 . The training part indeed doesn't scale so well with the dataset size and also affect the predictions time. The total run time for the attached benchmarks is 48min wall time and 187min CPU time (on a 4 core CPU), so most of the cricial operation do take advantage of a multi-threaded BLAS for linear algebra operations.
Any suggestions of how we could improve scaling (or general performance) are very welcome..
The text was updated successfully, but these errors were encountered:
Hi @MuellerSeb I know this was a while ago and I'm curious if these estimates are still roughly correct or if there are any recent changes that help scale to large datasets?
This issues aims to discuss scaling of PyKridge to large datsets (which could impact, for instance, the optimization approaches in issue #35).
Here are approximate (and possibly inaccurate) time complexity estimations for different processing steps of the kriging process in 2D, according to these benchmarks (adapted from PR #36), applied to a 5k-10k dataset which only have 2 measurement points for each parameter,
~O(N_train²)
~O(N_test*N_train^1.5)
~O(N_test*N_nn^(1~2))
For information, the approximate time complexity of linear algebra operations that may limit the performance are,
O(N^3)
for linear system inversionsO(N^3)
for matrix multiplicationO(N^3)
for matrix inversions(though the constant term would be quite different).
This may be of interest to @kvanlombeek and @basaks as discussed in issue #29 . The training part indeed doesn't scale so well with the dataset size and also affect the predictions time. The total run time for the attached benchmarks is 48min wall time and 187min CPU time (on a 4 core CPU), so most of the cricial operation do take advantage of a multi-threaded BLAS for linear algebra operations.
Any suggestions of how we could improve scaling (or general performance) are very welcome..
The text was updated successfully, but these errors were encountered: