-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(No) safe way to wrap taichi kernels in pytorch #8339
Comments
A utility like this one seems the right way to restore any manipulation of the
However, this still doesn't match the torch autograd system because taichi kernels on ndarrays seem to actually allocate the .grad?! (See #8340). Just setting all the the |
Torch gives a lot of warnings about use of the
|
I encountered the same problem. An additional side effect of this problem is that the custom |
I am encountering a similar issue and was wondering if there is any current guidance or recommendations for how to wrap a Taichi kernel that uses Taichi autodiff in a torch autograd.Function. If neither of the solutions proposed in #8101 integrate properly with the torch autodiff system, is the current best practice to copy data from tensors into fields before running the kernel and it's grad? |
That is safe - but also undesirable/non performant for many reasons. |
Currently one thing we use for our production applications is to use torch tensors directly in the autograd kernels, then we first backup original .grad attributes, create a new zero tensor and assign it to the .grad, and after running the backwards kernel the .grad is restored and the newly created grad tensors are returned. We are working on a way to do this without ugly marry go arounds like that (ability to pass a tuple of tensors for example...) |
Thanks - this seems like the best solution for now. |
Oh I see, to make it work it needs to actually copy the tensor, since the kernel grad function will mutate it. This version works with the gradcheck.
An improvement would be to avoid touching the .grad attribute in the forward pass (currently it creates a grad attribute with torch.zeros), then this would be more efficient, as most of the time the .grad is None - but since taichi has initialized it with zeros we end up creating and restoring a zero vector unnecessarily. Relevant code is here kernel_impl.py 762 - seems like the FIXME comment is quite relevant.
|
Commenting out the lines in kernel_impl and changing the restore_grad to the below code seems to work nicely, too - that way it doesn't create the zero tensors until they're actually needed - probably better to do this in Taichi code though so it doesn't crash otherwise...
|
Here is a small Torch module wrapper for Taichi kernels for annyone interested. Pros:
Cons:
|
See context in #8101. Essentially I don't think there's a completely safe way to wrap a taichi kernel in pytorch at the moment - below I implement the recommended solution from (#8101) but the problem is that the gradient does not propagate.
Using the other method mentioned in #8101 the gradient does propagate, but there are some slightly strange differences to "normal" functions using the taichi autograd depending on if retain_grad is enabled.
The text was updated successfully, but these errors were encountered: