-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryError: Unable to allocate array with shape (114671, 114671) and data type float64 #166
Comments
I have also faced this same issue in the past, is this a limitation of OK/PyKrige? |
I guess your input arrays What could help in your case, is to sample from this big amount of data and reduce to about 10.000 ~ 20.000 datapoints: sample_size = 10000
choice = np.random.choice(np.arange(x.size), sample_size)
x_smpl = x[choice]
y_smpl = y[choice]
z_smpl = z[choice] Now you can use |
Thanks @MuellerSeb for getting back.
At least in my case its not a RAM error. The system has about 396 GB RAM. Moreover, I can read the entire file into the RAM, the file isn't big too. numpy can also load the data into memory, and I am able to run almost all other methods (e.g. KNN, RF, SVC, etc.) The issue only arose when running |
When using float64 with an array with a size of 114671*114671 (the cdist matrix), you result in minimum in 100GB of data. And this has at least the same order of magnitude as your provided RAM. With an overhead of numpy and some additional arrays of this kind, RAM can be a problem. |
The cause of memory error is more specifically on this line: Line 75 in 48bc433
More specifically, in executing I am not aware of the specifics of this operation, and why we calculate the difference between
|
To chime in, having written that part of the code: That part of the code is a fairly simple implementation of the third equation of the section Computational formulas of this Wikipedia article. It was written using a simple vectorized version of the equation which creates a number of temporary arrays corresponding to terms of the rather large equation. If you are working so tightly at your RAM limit, these additional terms could be the icing on the cake. Apart from random subsampling, you could try to work in Euclidean space (see #149), if you don't explicitly want to use the great-circle distance at large distances. Specifically, this would mean to compute Euclidean coordinates Hope that helps! |
@mjziebarth Thanks for getting back.
I have about 396 GB RAM, much more than normal limits. Hence, it was a bit surprising to hit the limit which we rarely have with much larger datasets.
TBH, I don't see anything wrong in your code, its just a bit surprising that its hitting the memory limits. Since we are trying to benchmark, I was hesitant to change the parameters, but worst case, we can switch to Euclidean space for all the benchmarking tests. We are anyways using Google's Pixel coordinates system, hence using euclidean might make more sense. |
Hi, using Euclidean coordinates seemed to work for us. Thank you all for super quick responses! |
That is really interesting! Thanks for sharing. |
I get the following error: MemoryError: Unable to allocate array with shape (114671, 114671) and data type float64
Defining Ordinary Kriging as:
Where,
min_x = 8084396
min_y = 12073405
max_x = 8084864
max_y = 12073894
I understand that grid_x and grid_y arrays are too big. What can I do in this case to make this work?
The text was updated successfully, but these errors were encountered: