-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory corruption when interrupting a computation with BigInt #56545
Comments
Can you please try on nightly? With julia v1.11 I sometimes get julia crashing with
but with nightly I never have this issue. But your example is hard to reproduce that I'm not sure whether I'm just lucky or what. |
I still have the issue with nightly with my big program; the message is
I indeed cannot get it anymore with the smaller |
I don't think this is especially surprising or that there's anything we can do about it. in general, interrupting a program arbitrarily can corrupt memory |
It is part of the usability of the REPL to be able to interrupt too long computations. I only ever got this problem with |
The issue here is related to #49541 - basically, as @oscardssmith points out, it is currently not safe to interrupt basically any kind of code, because of how aggressive and non-cooperative our interrupt mechanism is. It's pretty good at stopping all kinds of code from continuing to run and subsequently returning to the REPL (to an extent), but because of how good it is at this, no guarantees can be made about the consistency of the current process' data structures (which may be in an inconsistent state as they were interrupted while working on some operation). Basically, you can interrupt just once, but you're left with a potentially garbage Julia session. The PR I linked would resolve this, not through magically making the current mechanism better, but by providing an alternative cooperative mechanism for allowing running code to gracefully stop itself and clean up safely when an interrupt has been triggered. Yet, even with that PR, much more work needs to be done - more graceful interrupt logic will need to be added to Julia/Base, maybe at allocation points, the top of loops, function entry, etc. There are many possibilities, with a variety of performance and interrupt latency trade-offs, and it's not at all clear that this will come anytime soon, even if that PR is merged. In summary: it's a hard problem, with some proposed solutions, but it will always be a battle to make graceful interrupts of arbitrary code reliable. |
Arguably then it should drop out of the REPL too (or print a warning). I often DO CTRL-C when precompiling, most often it works... or seems to, if it is actually safe, then please do do not kill the REPL. I think in same cases it doesn't work, or could be handled better, could it be caught there and handled gracefully? |
The post you quote from discusses exactly this. |
Up to now, I've thought that CTRL-C is safe, for Pkg, why I do it, and if Pkg/precompile can't handle it, it's a bug there (what I was referring to with that sentence). Just as a power-failure shouldn't damage disk structures (of Julia). I take it you confirm disk structures aren't safe from abort of Pkg, even if I do not use Pkg further in same session. Back to the issue here, I confirm a bug with yet another error:
most probably [Then the REPL froze, maybe since I pressed CTRL-C again.] https://forums.raspberrypi.com/viewtopic.php?t=367805
I'm guessing there's really no way to recover from this, after the fact, and all such C code. This would also fail in C (and Python and any language wrapping this?). For Julia code, i.e. those relying on GC then it likely shouldn't be an issue, but still all mutable structures in danger that have been half processed. Wouldn't all immutable memory structures (including for GC survive this?)? |
If you do a big computation with
BigInt
s likeThen
test_hm(BigInt,100)
takes about 2s so you can interrupt it with ^C. Then if you restart the same command you get often a message like
You may need to do the restart/break a few times to get the error. I have a bigger computation with
BigInt
where the error is systematic after a Ctrl-C. This is onThe text was updated successfully, but these errors were encountered: