-
I have an application that was working on previous libraries with key ones being - I reduced the application considerably to try and find the cause and ended up with a simple scenario that cause the crash. The 2nd core practically is doing nothing, except a single So in short, initializing Spi on first core that takes place after println! on the first core crash. This is the exception I get:
This is how I activate the 2nd core. As said, if I remove the println! and replace it with some other code (that I know isn't optimized out) the app functions fine.
Any ideas what to look for? |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 21 replies
-
Oof. Would you mind posting the complete code to reproduce this? I know there were some (other) problems regarding multicore in 0.15.0 but 0.16.0 should be fine.
|
Beta Was this translation helpful? Give feedback.
-
I actually ran into a similar issue over the weekend: https://github.com/MabezDev/mkey/pull/3/files#diff-35da999cf9ecf65198ff75ecff86ae8b14c804394e94d87628491978d8ab13bbR187-R192. My "workaround" is to call a function inside the closure. My initial thought was maybe wasn't being inlined, meaning it would corrupt the stack - but I didn't get a chance to test that out. Should be a simple as replacingset_stack_pointer with the asm to https://github.com/esp-rs/xtensa-lx/blob/d6b8224d8a3e426be3564481130d994fe73e2bf6/xtensa-lx/src/lib.rs#L49-L54.
If that works then my worst fears have been realized (see the FIXME in xtensa-lx) and we should probably remove the function and do this strictly in asm. |
Beta Was this translation helpful? Give feedback.
-
I'm still interested in a minimal repro - I tried a few things but can't get it to crash |
Beta Was this translation helpful? Give feedback.
-
Ah sorry - I somehow understood you already had a simple repro. That panic in the timer-driver looks sus. At that line it divides by the result of |
Beta Was this translation helpful? Give feedback.
-
I later tried to comment some code on the 2nd core thread, and I was able to get the 1st core code to work. It seems as if I when I go beyond some size of code I get those crashes, could that be? |
Beta Was this translation helpful? Give feedback.
-
I wonder if using |
Beta Was this translation helpful? Give feedback.
-
Thanks, I used I have two app versions with hal-0.14.0 and hal-0.15.0 which are exactly the same code except cosmetic tweaks to get it compiled and dependency versions changes (including required esp-wifi change). I disabled all code that interacts with SPI so the crash is only due to the wifi (running on 2nd core). I chose 0.15.0 for my experiments to be as close as possible to 0.14.0 with the crash looking the same on both versions.
As for data, my app (at least my code) only reserves space for 2nd core stack, all heap is on PSRAM, maybe a few statics (using So with 15.0 I looked at segments both when crashing and working with the minimalist code change. It's a single, simple line commenting, a line that never gets to run but does get compiled and not optimized out. This is the line I comment/uncomment which adds/removes the code for What I see that seems strange to me:
I have to say that I suspect But I don't know how to progress from here. Is there a way to place the 2nd core stack in PSRAM and have there enough buffers around so not to conflict with DRAM?
|
Beta Was this translation helpful? Give feedback.
-
@yanshay It seems 0.14 seems pretty rock-solid for me too. I, however, am not using esp-wifi so I think that rules it out. Could you bisect esp-hal from 0.14 when 0.14 is working, to when it fails? I will do the same of course, but more data points will be helpful. |
Beta Was this translation helpful? Give feedback.
-
So I think I'm close to something, but I'm going to have to leave this for now - hopefully, my findings are useful. I can consistently get it to work correctly when I subtract some arbitrary value from the top of the stack:
Where 128 is somewhat variable depending on how big the closure is (It might also be stack usage of the closure too, I can't tell yet). I don't think reverting #1081 is right, because it was obviously wrong and @bugadani also agreed that this wasn't correct before. So we need to figure the relationship between reducing the top of the stack and the task closure. |
Beta Was this translation helpful? Give feedback.
-
Fixed via #1286 |
Beta Was this translation helpful? Give feedback.
Fixed via #1286