You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's nice to see your repository. I can see that you train a new tokenizer based on your new corpus, but I wonder does it change the token id of the original model base? If it does then it might effect the weight of original pre-train model, as you continue to pre-train the model from the pre-trained model right?
The text was updated successfully, but these errors were encountered:
Hi @dinhngoc267, very sorry for the late reply. I might have missed the notification for some reason.
To clarify, we did not continue training from a pre-trained model. IndoT5 was trained completely from scratch, with a new vocabulary/tokenizer. It is only when we fine-tuned to a downstream task like QA, summarization, etc., where we fine-tuned our model; but we kept the same vocabulary for this step.
Hi,
It's nice to see your repository. I can see that you train a new tokenizer based on your new corpus, but I wonder does it change the token id of the original model base? If it does then it might effect the weight of original pre-train model, as you continue to pre-train the model from the pre-trained model right?
The text was updated successfully, but these errors were encountered: