You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Prof Karpathy,
I wanted to create a discussion to ask this question, but there was no provision as such. I was watching https://youtu.be/VMj-3S1tku0 and got an idea.
Context
This is in reference to the step of clearing accumulated gradients at:
People tend to forget to clear the gradients wrt the loss function backward pass.
Idea
Create a way to bind the loss function to the network once, and then automatically clear accumulated gradients automatically when performing the backward pass.
Advantage
We can perform backward pass whenever, wherever, and as many times as we want without worrying about accumulated gradient.
Pseudocode
classLoss(Value):
def__init__(self, bound_network):
self.bound_network=bound_networkdef__call__(self, batch_size=None):
# loss function definitionself.data=data_loss+reg_lossdefbackward():
# clear gradients of bound networkbound_network.zero_grad()
super().backward()
total_loss=Loss(
bound_network=model
)
forkinrange(100):
# ...# model.zero_grad() # since total_loss is bound to network, it should automatically perform model.zero_grad() before doing the backwardtotal_loss.backward()
# ...
Questions
Is my understanding of the problem correct?
Is this change value-adding?
Is the above pseudocode logically correct?
If the answer to all the above are yes, I could work on a PR with your guidance.
The text was updated successfully, but these errors were encountered:
Hi Prof Karpathy,
I wanted to create a discussion to ask this question, but there was no provision as such. I was watching https://youtu.be/VMj-3S1tku0 and got an idea.
Context
This is in reference to the step of clearing accumulated gradients at:
micrograd/demo.ipynb
Line 265 in c911406
Problem
People tend to forget to clear the gradients wrt the loss function backward pass.
Idea
Create a way to bind the loss function to the network once, and then automatically clear accumulated gradients automatically when performing the backward pass.
Advantage
We can perform backward pass whenever, wherever, and as many times as we want without worrying about accumulated gradient.
Pseudocode
Questions
The text was updated successfully, but these errors were encountered: