Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate managed memory if device memory runs out #709

Merged
merged 3 commits into from
Aug 16, 2024

Conversation

ngc92
Copy link
Contributor

@ngc92 ngc92 commented Jul 24, 2024

Use cudaMallocManaged to allocate optimizer states if we run out of device memory, so we can still train (slowly) even if we cannot fit the optimizer state
This is based on #694 , which should be merged first

@@ -393,13 +393,13 @@ void gpt2_allocate_state(GPT2 *model, int B, int T) {
printf0("allocating %zu MiB for AdamW optimizer state v\n", (shard_num_parameters * sizeof(float)) >> 20);
assert(model->m_memory == nullptr);
assert(model->v_memory == nullptr);
cudaCheck(cudaMalloc((void**)&model->m_memory, shard_num_parameters * sizeof(float)));
cudaCheck(cudaMalloc((void**)&model->v_memory, shard_num_parameters * sizeof(float)));
cudaMallocConditionallyManaged((void**)&model->m_memory, shard_num_parameters * sizeof(float));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we should try to update the import statements to show which file any function (e.g. the new Managed manaloc) comes from (here cuda_utils)

@karpathy karpathy merged commit f72c1f2 into karpathy:master Aug 16, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants