Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.AccessViolationException Error when using TorchSharp #1292

Open
rapid-18 opened this issue Apr 25, 2024 · 7 comments
Open

System.AccessViolationException Error when using TorchSharp #1292

rapid-18 opened this issue Apr 25, 2024 · 7 comments

Comments

@rapid-18
Copy link

A System.AccessViolationException Error occurs when executing a simple code
model = jit.load(modelPath, DeviceType.CUDA);
how can I fit that
the torchsharp version is TorchSharp-cuda-windows 0.100.7
the details of error information:
System.AccessViolationException
HResult=0x80004003
Message=尝试读取或写入受保护的内存。这通常指示其他内存已损坏。
Source=
StackTrace:

@yueyinqiu
Copy link
Contributor

yueyinqiu commented Apr 25, 2024

It could work for me (the exception is not thrown by jit.load):

image

Could you please provide us more details, like the specific model file or something else?

By the way, could the problem be solved by updating the package? I'm using 0.102.4.

@rapid-18
Copy link
Author

rapid-18 commented Apr 25, 2024

I could hardly provide any more details because it's just a simple pytorch trained model, and this problem also occurs in many other executions related to loading model, like below
nn.Module Model = torchvision.models.resnet50();
so I think its not the problem from the model
But thank you anyway and I wil try on a newer version

@NiklasGustafsson
Copy link
Contributor

Yes, please start by upgrading to the most recent version of TorchSharp -- it's very hard (let's call it impossible), given limited resources, for us to troubleshoot a version as old as 0.100.7, which is based on an earlier version of libtorch.

@rapid-18
Copy link
Author

This problem still exists even when I change to version 0.102.4 when loading models.......

@NiklasGustafsson
Copy link
Contributor

Okay, that's good to know, that makes it easier to troubleshoot.

Can you please show the Python code used to generate the "exported.method.dat" file?

@yueyinqiu
Copy link
Contributor

yueyinqiu commented Apr 27, 2024

@NiklasGustafsson The exported.method.dat is created by me, not by @rapid-18 .

And it could work for me. The exception is just because I didn't pass the required parameter. It's not the AccessViolationException we are talking about. Sorry for the misleading screenshot.

@travisjj
Copy link

This type of exception appears to be caused by the user's system running out of memory.

As the code being executed is using pointers, the OOM exception is sort of vaguely just represented as a memory access violation. I have found expanding the virtual memory can alleviate this (for users with SSD's), however, if your model is excessively large, then you may consider trialing a small version to verify it runs and then porting it to a cloud GPU.

The memory issue can be verified by slowly expanding the system memory, and observing that the AccessViolation will occur in different places in code.

The core issue of using too much memory could happen from a wide variety of issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants