Exception after upgrading libraries - Multicore + Spi/println! combination #1262

yanshay · 2024-03-09T14:46:45Z

yanshay
Mar 9, 2024

I have an application that was working on previous libraries with key ones being - esp-hal(0.14.0)/embassy (0.4.0)/embedded_sdmmc(0.6.0).
I upgraded to the latest libraries on all fronts and after all required changes the application panics mid way when running.
(On a side note, the crash is both on esp-hal 0.1.5.0 and esp-hal 0.16.0).
The board is bases on esp32s3.
The application is multicore.

I reduced the application considerably to try and find the cause and ended up with a simple scenario that cause the crash.

The 2nd core practically is doing nothing, except a single println! (which succeed).
The 1st core crash when reaching initialization of Spi, but it crashes ONLY if that println! on the 2nd core ran. earlier. If I remove the println! on the 2nd core (and instead put some other code but not println!) the application works fine and the Spi initialization works fine and functions.

So in short, initializing Spi on first core that takes place after println! on the first core crash.

This is the exception I get:

Exception occured 'LoadProhibited'
Context
PC=0x4202ea5f       PS=0x00060930
0x4202ea5f - _ZN7esp_hal3spi6master8Instance5setup17h5a71b6aa4bfa768eE
    at ??:??
0x00060930 - PS_WOE
    at ??:??
A0=0x8202ec5f       A1=0x3fcdc320       A2=0x00000001       A3=0x001e8480       A4=0x00000000
0x8202ec5f - _rtc_fast_bss_start
    at ??:??
0x3fcdc320 - __stack_chk_guard
    at ??:??
0x00000001 - XT_STK_PC
    at ??:??
0x001e8480 - PS_WOE
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
A5=0x3c0b86a4       A6=0x3c0b968c       A7=0x00060920       A8=0x00060920       A9=0x00000100
0x3c0b86a4 - _rodata_end
    at ??:??
0x3c0b968c - _rodata_end
    at ??:??
0x00060920 - PS_WOE
    at ??:??
0x00060920 - PS_WOE
    at ??:??
0x00000100 - XT_STK_FRMSZ
    at ??:??
A10=0x3fc89ea0      A11=0x10000000      A12=0x600c0020      A13=0x10000000      A14=0x0000001c
0x3fc89ea0 - _ZN7esp_hal21critical_section_impl9multicore14MULTICORE_LOCK17h9269ab54aaba30e5E
    at ??:??
0x10000000 - PS_WOE
    at ??:??
0x600c0020 - _rtc_slow_data_start
    at ??:??
0x10000000 - PS_WOE
    at ??:??
0x0000001c - XT_STK_A5
    at ??:??
A15=0x3c091602
0x3c091602 - anon.86cb23a75cb2e6aa2d8a0cfbcc71fb2e.62.llvm.2970327317977471169
    at ??:??
SAR=00000018
EXCCAUSE=0x0000001c EXCVADDR=0x00000004
0x0000001c - XT_STK_A5
    at ??:??
0x00000004 - XT_STK_PS
    at ??:??
LBEG=0x403cea32     LEND=0x403cea36     LCOUNT=0x00000000
0x403cea32 - __default_double_exception
    at ??:??
0x403cea36 - __default_double_exception
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
THREADPTR=0x00000000
0x00000000 - XT_STK_PC
    at ??:??
SCOMPARE1=0x00000100
0x00000100 - XT_STK_FRMSZ
    at ??:??
BR=0x00000001
0x00000001 - XT_STK_PC
    at ??:??
ACCLO=0x00000000    ACCHI=0x00000000
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
M0=0x00000000       M1=0x00000000       M2=0x00000000       M3=0x00000000
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
F64R_LO=0x3fcdc310  F64R_HI=0x3fcdc38c  F64S=0x4206cb08
0x3fcdc310 - __stack_chk_guard
    at ??:??
0x3fcdc38c - __stack_chk_guard
    at ??:??
0x4206cb08 - _ZN4core3fmt8builders11DebugStruct5field17h32a3cc26a9a33f62E
    at ??:??
FCR=0x00000000      FSR=0x00000080
0x00000000 - XT_STK_PC
    at ??:??
0x00000080 - XT_STK_M3
    at ??:??
F0=0x00000000       F1=0x3f000000       F2=0x46fffe00       F3=0xc7000000       F4=0x47000000
0x00000000 - XT_STK_PC
    at ??:??
0x3f000000 - _rodata_end
    at ??:??
0x46fffe00 - _ZN17compiler_builtins3mem6memcmp17h793c032c681f05f2E
    at ??:??
0xc7000000 - _rtc_fast_bss_start
    at ??:??
0x47000000 - _ZN17compiler_builtins3mem6memcmp17h793c032c681f05f2E
    at ??:??
F5=0xc7000100       F6=0x42e80000       F7=0x00000000       F8=0x40800000       F9=0x40a00000
0xc7000100 - _rtc_fast_bss_start
    at ??:??
0x42e80000 - _ZN17compiler_builtins3mem6memcmp17h793c032c681f05f2E
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x40800000 - __default_double_exception
    at ??:??
0x40a00000 - __default_double_exception
    at ??:??
F10=0x40a00000      F11=0x43868000      F12=0x3f800000      F13=0x3f800000      F14=0x429c0000
0x40a00000 - __default_double_exception
    at ??:??
0x43868000 - _ZN17compiler_builtins3mem6memcmp17h793c032c681f05f2E
    at ??:??
0x3f800000 - _rodata_end
    at ??:??
0x3f800000 - _rodata_end
    at ??:??
0x429c0000 - _ZN17compiler_builtins3mem6memcmp17h793c032c681f05f2E
    at ??:??
F15=0x40400000
0x40400000 - __default_double_exception
    at ??:??

0x42026974
0x42026974 - _ZN14smarthomepanel10view_model19ViewModel$LT$SD$GT$5setup17hb24dfcff8df5be16E
    at ??:??
0x4203a206
0x4203a206 - _ZN16embassy_executor3raw20TaskStorage$LT$F$GT$4poll17hf375c35bad23112cE.llvm.2118310621361504250
    at ??:??
0x4206a5db
0x4206a5db - _ZN16embassy_executor3raw8Executor4poll17h45e3bd425bcd9eafE
    at ??:??
0x42032b2c
0x42032b2c - _ZN7esp_hal7embassy8executor6thread8Executor3run17h2e170634429e0c03E
    at ??:??
0x4203210a
0x4203210a - main
    at ??:??
0x4206aa8b
0x4206aa8b - Reset
    at ??:??
0x4037949f
0x4037949f - ESP32Reset
    at ??:??
0x40000000
0x40000000 - ets_rom_layout_p
    at ??:??
0x403cdd7c
0x403cdd7c - __default_double_exception
    at ??:??
0x403c9974
0x403c9974 - __default_double_exception
    at ??:??

This is how I activate the 2nd core. As said, if I remove the println! and replace it with some other code (that I know isn't optimized out) the app functions fine.

    let cpu1_fnctn = || {
        println!("Creating secondary thread executor");
    };

    println!("Activating secondary core");
    let _guard = cpu_control.start_app_core(unsafe { &mut SECONDARY_STACK }, cpu1_fnctn).unwrap();

Any ideas what to look for?

Answered by MabezDev

Mar 14, 2024

Fixed via #1286

View full answer

bjoernQ · 2024-03-11T08:24:23Z

bjoernQ
Mar 11, 2024
Maintainer

Oof. Would you mind posting the complete code to reproduce this? I know there were some (other) problems regarding multicore in 0.15.0 but 0.16.0 should be fine.

esp-println might be a red herring here (i.e. it's triggering some problem but it's not the root cause)

0 replies

MabezDev · 2024-03-11T11:53:40Z

MabezDev
Mar 11, 2024
Maintainer

I actually ran into a similar issue over the weekend: https://github.com/MabezDev/mkey/pull/3/files#diff-35da999cf9ecf65198ff75ecff86ae8b14c804394e94d87628491978d8ab13bbR187-R192. My "workaround" is to call a function inside the closure. My initial thought was maybe

esp-hal/esp-hal/src/soc/esp32s3/cpu_control.rs

Line 212 in 85ee380

set_stack_pointer(unsafe { unwrap!(APP_CORE_STACK_TOP) });

wasn't being inlined, meaning it would corrupt the stack - but I didn't get a chance to test that out. Should be a simple as replacing set_stack_pointer with the asm to https://github.com/esp-rs/xtensa-lx/blob/d6b8224d8a3e426be3564481130d994fe73e2bf6/xtensa-lx/src/lib.rs#L49-L54.

If that works then my worst fears have been realized (see the FIXME in xtensa-lx) and we should probably remove the function and do this strictly in asm.

0 replies

bjoernQ · 2024-03-11T13:07:36Z

bjoernQ
Mar 11, 2024
Maintainer

I'm still interested in a minimal repro - I tried a few things but can't get it to crash

1 reply

yanshay Mar 11, 2024
Author

I'm still interested in a minimal repro - I tried a few things but can't get it to crash

It's a pretty convoluted application and board + sd-card data specific and can't be shareed it as is.
I tried building similar scenario from scratch but it didn't get reproduced the case.
I'll try next to strip down my app as much as possible to get this reproduced in a way that would be useful.

My "workaround" is to call a function inside the closure. My initial thought was maybe

I tried this and indeed it eliminated the immediate problem and the application got to progress further. It was a very stripped down version of my full application, so I started uncommenting lines that were earlier removed and at a certain point it crashed again, but differently this time.

The 2nd core had a panic (I'm doing wifi on the 2nd core)
1st core continued to run some
1st core crashed in the same spot as above on Spi initialization

I didn't see how that panic was related to the lines I uncommented.

So I tried to pinpoint the exact line that caused the panic which was inside a certain function. Consistently if that line was added it panicked and if it was commented the app worked fine (just w/o some missing functionality that was commented). These commented lines were assigned to the 1st core.

However, the strange thing is that when the 2nd core crashed it was before that function was even invoked. It's as if that extra line of code caused panic elsewhere much earlier in the application execution and seems like on the 2nd core.
And when the crash on the 1st core arrived it was at the beginning of that function, never reaching the line that supposedly was the trigger (the one that uncommenting/commenting caused/eliminated the crashes).

So just something about the code "signature" caused the application to panic 2nd core and then crash 1st core.
I don't know if all this information helps in any way to understand what's going on.

Below are the two exceptions I got (single execution).

Then I tried replacing the set_stack_pointer with the asm! code as suggested above, it didn't have any effect on the new issue. Maybe it would have fixed the original issue (before calling a function from the closure), I didn't check that.

!! A panic occured in '/Users/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-hal-0.16.0/src/timer.rs', at line 649, column 23:
attempt to divide by zero

Backtrace:

0x420be48f
0x420be48f - _ZN4core9panicking5panic17h211c1bf92844f8f9E
    at ??:??
0x42065a86
0x42065a86 - _ZN81_$LT$esp_hal..timer..Timer$LT$T$GT$$u20$as$u20$embedded_hal..timer..CountDown$GT$5start17hffc4ff7e9f13cce4E
    at ??:??
0x42066615
0x42066615 - _ZN8esp_wifi5timer13arch_specific11setup_timer17hce6de7cf0a554558E
    at ??:??
0x4206698c
0x4206698c - _ZN8esp_wifi10initialize17h62dad74c9043f2b5E
    at ??:??
0x4201865d
0x4201865d - _ZN16embassy_executor3raw20TaskStorage$LT$F$GT$4poll17hf785bf91328b793fE.llvm.2601768499350893140
    at ??:??
0x420b9638
0x420b9638 - _ZN16embassy_executor3raw8Executor4poll17h45e3bd425bcd9eafE
    at ??:??
0x4205a106
0x4205a106 - _ZN7esp_hal7embassy8executor6thread8Executor3run17h29ab44410aab6f83E
    at ??:??
0x4204b723
0x4204b723 - _ZN7esp_hal3soc14implementation11cpu_control10CpuControl16start_core1_init17h6e0641f11249e3d6E.llvm.8363092977244829058
    at ??:??
0x3ffffffd
0x3ffffffd - ets_rom_layout_p
    at ??:??
0x40034c45
0x40034c45 - rom_rx_gain_force
    at ??:??



///
/// Application output from 1st core removed
///

--- About to initialize spi



Exception occured 'LoadProhibited'
Context
PC=0x42035ae5       PS=0x00060930
0x42035ae5 - _ZN7esp_hal3spi6master43Spi$LT$T$C$esp_hal..spi..FullDuplexMode$GT$12new_internal17hbbbbe9d07dfff20dE
    at ??:??
0x00060930 - PS_WOE
    at ??:??
A0=0x8202d8d7       A1=0x3fcdc0e0       A2=0x001e8480       A3=0x00000000       A4=0x00000001
0x8202d8d7 - _rtc_fast_data_end
    at ??:??
0x3fcdc0e0 - __stack_chk_guard
    at ??:??
0x001e8480 - PS_WOE
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x00000001 - XT_STK_PC
    at ??:??
A5=0x3c1486a4       A6=0x3c14968c       A7=0x00000001       A8=0x82035ae5       A9=0x3fcdc0c0
0x3c1486a4 - _sidata
    at ??:??
0x3c14968c - _sidata
    at ??:??
0x00000001 - XT_STK_PC
    at ??:??
0x82035ae5 - _rtc_fast_data_end
    at ??:??
0x3fcdc0c0 - __stack_chk_guard
    at ??:??
A10=0x00000001      A11=0x3c11097e      A12=0x00000000      A13=0x40000648      A14=0x0000001c
0x00000001 - XT_STK_PC
    at ??:??
0x3c11097e - anon.12b89a1e3791cb39fa962846f69b7103.79.llvm.17974336764396607129
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x40000648 - uart_tx_one_char
    at ??:??
0x0000001c - XT_STK_A5
    at ??:??
A15=0x3c11097e
0x3c11097e - anon.12b89a1e3791cb39fa962846f69b7103.79.llvm.17974336764396607129
    at ??:??
SAR=00000018
EXCCAUSE=0x0000001c EXCVADDR=0x00000005
0x0000001c - XT_STK_A5
    at ??:??
0x00000005 - XT_STK_PS
    at ??:??
LBEG=0x403cea32     LEND=0x403cea36     LCOUNT=0x00000000
0x403cea32 - hal_mac_txq_disable
    at ??:??
0x403cea36 - hal_mac_txq_disable
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
THREADPTR=0x00000000
0x00000000 - XT_STK_PC
    at ??:??
SCOMPARE1=0x00000100
0x00000100 - XT_STK_FRMSZ
    at ??:??
BR=0x00000001
0x00000001 - XT_STK_PC
    at ??:??
ACCLO=0x00000000    ACCHI=0x00000000
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
M0=0x00000000       M1=0x00000000       M2=0x00000000       M3=0x00000000
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
F64R_LO=0x0000001c  F64R_HI=0x3c149394  F64S=0x3c111990
0x0000001c - XT_STK_A5
    at ??:??
0x3c149394 - _sidata
    at ??:??
0x3c111990 - anon.0dfc9eef4990df036bb171cc2192f717.21.llvm.5101450665113350858
    at ??:??
FCR=0x00000000      FSR=0x00000080
0x00000000 - XT_STK_PC
    at ??:??
0x00000080 - XT_STK_M3
    at ??:??
F0=0x00000000       F1=0x3f000000       F2=0x46fffe00       F3=0xc7000000       F4=0x47000000
0x00000000 - XT_STK_PC
    at ??:??
0x3f000000 - _sidata
    at ??:??
0x46fffe00 - pmksa_cache_clear_current
    at ??:??
0xc7000000 - _rtc_fast_data_end
    at ??:??
0x47000000 - pmksa_cache_clear_current
    at ??:??
F5=0xc7000100       F6=0x42e80000       F7=0x00000000       F8=0x40800000       F9=0x40a00000
0xc7000100 - _rtc_fast_data_end
    at ??:??
0x42e80000 - pmksa_cache_clear_current
    at ??:??
0x00000000 - XT_STK_PC
    at ??:??
0x40800000 - hal_mac_txq_disable
    at ??:??
0x40a00000 - hal_mac_txq_disable
    at ??:??
F10=0x40a00000      F11=0x43868000      F12=0x3f800000      F13=0x3f800000      F14=0x429c0000
0x40a00000 - hal_mac_txq_disable
    at ??:??
0x43868000 - pmksa_cache_clear_current
    at ??:??
0x3f800000 - _sidata
    at ??:??
0x3f800000 - _sidata
    at ??:??
0x429c0000 - pmksa_cache_clear_current
    at ??:??
F15=0x40400000
0x40400000 - hal_mac_txq_disable
    at ??:??

0x4203b738
0x4203b738 - _ZN14smarthomepanel10view_model19ViewModel$LT$SD$GT$5setup17hb24dfcff8df5be16E
    at ??:??
0x4201850d
0x4201850d - _ZN16embassy_executor3raw20TaskStorage$LT$F$GT$4poll17hf375c35bad23112cE.llvm.2601768499350893140
    at ??:??
0x420b963b
0x420b963b - _ZN16embassy_executor3raw8Executor4poll17h45e3bd425bcd9eafE
    at ??:??
0x4205a15c
0x4205a15c - _ZN7esp_hal7embassy8executor6thread8Executor3run17h2e170634429e0c03E
    at ??:??
0x4204c9a2
0x4204c9a2 - main
    at ??:??
0x420b9aeb
0x420b9aeb - Reset
    at ??:??
0x403795ab
0x403795ab - ESP32Reset
    at ??:??
0x40000000
0x40000000 - ets_rom_layout_p
    at ??:??
0x403cdd7c
0x403cdd7c - hal_mac_txq_disable
    at ??:??
0x403c9974
0x403c9974 - hal_mac_txq_disable
    at ??:??

bjoernQ · 2024-03-11T16:00:26Z

bjoernQ
Mar 11, 2024
Maintainer

It's a pretty convoluted application and board + sd-card data specific and can't be shareed it as is.
I tried building similar scenario from scratch but it didn't get reproduced the case.
I'll try next to strip down my app as much as possible to get this reproduced in a way that would be useful.

Ah sorry - I somehow understood you already had a simple repro.

That panic in the timer-driver looks sus. At that line it divides by the result of divider() which should never be 0. (0 => 65536,)
It feels like like a memory corruption (stack, .bss, .rwdata, ...) but it's really hard to guess here

0 replies

yanshay · 2024-03-11T16:02:24Z

yanshay
Mar 11, 2024
Author

I later tried to comment some code on the 2nd core thread, and I was able to get the 1st core code to work.
And again, I found a line of code, that when added cause the exact same panic and then crash way earlier than it get to be executed. I even wrapped it with some 'if' that is never true just to be certain that it doesn't run and maybe logs just don't show, and panic/crash still occured.

It seems as if I when I go beyond some size of code I get those crashes, could that be?

0 replies

bjoernQ · 2024-03-11T16:10:27Z

bjoernQ
Mar 11, 2024
Maintainer

I wonder if using "-Z", "stack-protector=all", like described here #1135 will trigger for you. Maybe not - just curious

4 replies

yanshay Mar 11, 2024
Author

I don't see any difference with that flag except the size of the binary as shows when flashing is larger.

The current situation is as follows:
I have two lines of code at different places in my code. Both are behind if statement that evaluates to false (but not optimized out) so they surely don't run.
If both lines are not commented - the application crash before either of them is even close to being executed. Same exact crash scenario.
If either one of the lines is commented out the application is running fine.

How can I even start to debug something like that? Feels like a dead end :(

What's even more strange is that the binary size (as shown when flashing) is larger when one of those lines is commented. So removing code increase binary size?

bjoernQ Mar 11, 2024
Maintainer

Thanks for checking. A slight in increase in code size when using stack-protector is expected since every function gets a few bytes of instrumentation code added.

I'd really love to be more helpful in tracking this down.

Just out of desperation you could experiment with optimization levels and LTO but I won't expect too much from that

yanshay Mar 11, 2024
Author

Thanks, I'll try that next.

As for binary size, in case it could be the issue, this is what I see when flashing.
I tried reducing it with profile.release settings, but it had no effect. Even when I used strip="symbols" it eliminate the symbols (so crash had only numbers). Symbols did disappear but binary size actually increased a bit.
I also removed 16KB of include_bytes! that I used for debugging this and binary size shrinked by less than 200 bytes. Something is very strange there. Maybe I misunderstand something basic there?

Is this considered a large size? Maybe there are some memory maps that get filled with data?

App/part. size:    1,240,272/16,384,000 bytes, 7.57%
[00:00:00] [========================================]      14/14      0x0
[00:00:00] [========================================]       1/1       0x8000
[00:00:11] [========================================]     720/720     0x10000

bjoernQ Mar 12, 2024
Maintainer

Debug information will only be in the elf file - not in the finally flashed binary so it's better to include debug = true in [profile.release]

1.2MB is not very small but it really depends on what it includes.

There is a gap of max 64k between code and data for technical reasons. Maybe by removing the include_bytes! it decreased the size of data but since the gap still needs to fill the space on flash to get to the next 64k block it almost changed nothing for you

Just looking at the size of the finally flashed image might be misleading sometimes - better to look at the individual segments. You can use readelf or similar utilities for that. You can also try https://github.com/bjoernQ/espsegs to get more insights from the elf file

yanshay · 2024-03-12T14:50:12Z

yanshay
Mar 12, 2024
Author

Thanks,

I used espsegs, some things I don't know to explain but don't have a clear direction.

I have two app versions with hal-0.14.0 and hal-0.15.0 which are exactly the same code except cosmetic tweaks to get it compiled and dependency versions changes (including required esp-wifi change). I disabled all code that interacts with SPI so the crash is only due to the wifi (running on 2nd core). I chose 0.15.0 for my experiments to be as close as possible to 0.14.0 with the crash looking the same on both versions.

0.14.0 never crashes and completely stable (and always was for months I'm working on this app)
0.15.0 crashes (just like 0.16.0) unless I comment code out (code that doesn't get to run, and I have several options in different areas of the app).

As for data, my app (at least my code) only reserves space for 2nd core stack, all heap is on PSRAM, maybe a few statics (using make_static! but nothing large that I know of outside the stack.

So with 15.0 I looked at segments both when crashing and working with the minimalist code change. It's a single, simple line commenting, a line that never gets to run but does get compiled and not optimized out.

This is the line I comment/uncomment which adds/removes the code for
cmdbuffer_h[0..2].copy_from_slice(&(range.start as u16).to_ne_bytes());
Basically the code for copy_from_slice and to_ne_bytes is added/removed from the binary (only place they are used).

What I see that seems strange to me:

The size of the code and static data is larger when commenting out code. How can that be?
I also printed the 2nd stack address and it looks at a a correct segment but it moves quite a lot with this minor code change.

I have to say that I suspect esp-wifi, because it's what's crashing and I've seen in the past (with esp32) that it references areas where it shouldn't and saw others report that as well (esp-rs/esp-wifi-sys#412 - issue is still open).

But I don't know how to progress from here.

Is there a way to place the 2nd core stack in PSRAM and have there enough buffers around so not to conflict with DRAM?
I couldn't find a way to do it in a safe way, not sure it can work.

-- Crashing ---------------------------------------------------------------------------------------------------------------
2nd Stack Address: 0x3fc90bb0

.rodata_dummy 3c000020  983040 DROM     [██                                                                               ]
.rodata       3c0f0020  198544 DROM     [  ▏                                                                              ]
.rodata.wifi  3c1207b0   21204 DROM     [  ▏                                                                              ]

.rwdata_dummy 3fc88000   25296 DRAM     [████                                                                             ]
.data         3fc8e2d0    4256 DRAM     [    ▏                                                                            ]
.bss          3fc8f370  164104 DRAM     [     ███████████████████████████████                                             ]
.data.wifi    3fcb7478     360 DRAM     [                                    ▏                                            ]

.vectors      40378000    1024 IRAM     [▏                                                                                ]
.rwtext       40378400    6372 IRAM     [█                                                                                ]
.rwtext.wifi  40379ce4   17900 IRAM     [ ███                                                                             ]

.text         42000020  928960 IROM     [██                                                                               ]



-- Working (code commented out) ----------------------------------------------------------------------------------------------------------------
2nd Stack Address: 0x3fc92940

.rodata_dummy 3c000020  983040 DROM     [██                                                                               ]
.rodata       3c0f0020  198560 DROM     [  ▏. <--- larger                                                                              ]   
.rodata.wifi  3c1207c0   21204 DROM     [  ▏                                                                              ]

.rwdata_dummy 3fc88000   25296 DRAM     [████                                                                             ]
.data         3fc8e2d0    4256 DRAM     [    ▏                                                                            ]
.bss          3fc8f370  164096 DRAM     [     ███████████████████████████████  <-- Smaller                                            ]
.data.wifi    3fcb7470     360 DRAM     [                                    ▏                                            ]

.vectors      40378000    1024 IRAM     [▏                                                                                ]
.rwtext       40378400    6372 IRAM     [█                                                                                ]
.rwtext.wifi  40379ce4   17900 IRAM     [ ███                                                                             ]

.text         42000020  929108 IROM     [██   <--- larger                                                                            ]

0 replies

MabezDev · 2024-03-12T23:26:26Z

MabezDev
Mar 12, 2024
Maintainer

@yanshay It seems 0.14 seems pretty rock-solid for me too. I, however, am not using esp-wifi so I think that rules it out. Could you bisect esp-hal from 0.14 when 0.14 is working, to when it fails? I will do the same of course, but more data points will be helpful.

15 replies

yanshay Mar 13, 2024
Author

In my program, the top of the stack was at 0x3fc8b1a0, START_CORE1_FUNCTION was at 0x3fc8d1a4 only one word apart...

0x3fc8  b 1a0
0x3fc8  d 1a4

Isn't it one word + 0x2000 ?

MabezDev Mar 13, 2024
Maintainer

Heh yes, I just came back to fix my mistake :D. This is confusing me, I'm struggling to see how this is failing. It seems setting the stack top to something either bigger or smaller than the actual stack memory available makes this work 🤔.

yanshay Mar 13, 2024
Author

I noticed that START_CORE1_FUNCTION and APP_CORE_STACK_TOP was being overwritten to zero sometimes (it's hard to know when APP_CORE_STACK_TOP is corrupted because all hell breaks loose).

If you know what address is being overwritten and with what value maybe it's possible to set a Watchpoint? I read it's possible with Esp32 even though I don't know exactly how.

bjoernQ Mar 13, 2024
Maintainer

another random thought: in esp-wifi we do a lot more to initialize the task stacks: https://github.com/esp-rs/esp-wifi/blob/9224845a67de0722846c384406dbe1cf80b41541/esp-wifi/src/preempt/preempt_xtensa.rs#L69-L94

another thing that just came to my mind: what if there is something writing to the stack before we set SP? what is SP when starting the second core?

MabezDev Mar 13, 2024
Maintainer

I moved those functions to another place via linker script and the issue was still present.

MabezDev · 2024-03-13T14:49:09Z

MabezDev
Mar 13, 2024
Maintainer

So I think I'm close to something, but I'm going to have to leave this for now - hopefully, my findings are useful.

I can consistently get it to work correctly when I subtract some arbitrary value from the top of the stack:

unsafe { self.bottom().add((SIZE / 4) - 128) }

Where 128 is somewhat variable depending on how big the closure is (It might also be stack usage of the closure too, I can't tell yet).

I don't think reverting #1081 is right, because it was obviously wrong and @bugadani also agreed that this wasn't correct before. So we need to figure the relationship between reducing the top of the stack and the task closure.

1 reply

MabezDev Mar 13, 2024
Maintainer

I guess its the stack usage of start_core_init?

Using -Zemit-stack-sizes
0x42000830 272 esp_hal::soc::implementation::cpu_control::CpuControl::start_core1_init::hcb8159948ddb7199

If I subtract the closest aligned 16 byte value, which is 272

unsafe { self.bottom().add((SIZE / 4) - ((17 * 16) / 4)) }

It works perfectly. But isn't this the whole point of the stack space to begin with...

MabezDev · 2024-03-14T23:47:49Z

MabezDev
Mar 14, 2024
Maintainer

Fixed via #1286

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception after upgrading libraries - Multicore + Spi/println! combination #1262

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 10 comments 21 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Exception after upgrading libraries - Multicore + Spi/println! combination #1262

yanshay Mar 9, 2024

Replies: 10 comments · 21 replies

bjoernQ Mar 11, 2024 Maintainer

MabezDev Mar 11, 2024 Maintainer

bjoernQ Mar 11, 2024 Maintainer

yanshay Mar 11, 2024 Author

bjoernQ Mar 11, 2024 Maintainer

yanshay Mar 11, 2024 Author

bjoernQ Mar 11, 2024 Maintainer

yanshay Mar 11, 2024 Author

bjoernQ Mar 11, 2024 Maintainer

yanshay Mar 11, 2024 Author

bjoernQ Mar 12, 2024 Maintainer

yanshay Mar 12, 2024 Author

MabezDev Mar 12, 2024 Maintainer

yanshay Mar 13, 2024 Author

MabezDev Mar 13, 2024 Maintainer

yanshay Mar 13, 2024 Author

bjoernQ Mar 13, 2024 Maintainer

MabezDev Mar 13, 2024 Maintainer

MabezDev Mar 13, 2024 Maintainer

MabezDev Mar 13, 2024 Maintainer

MabezDev Mar 14, 2024 Maintainer

yanshay
Mar 9, 2024

Replies: 10 comments 21 replies

bjoernQ
Mar 11, 2024
Maintainer

MabezDev
Mar 11, 2024
Maintainer

bjoernQ
Mar 11, 2024
Maintainer

yanshay Mar 11, 2024
Author

bjoernQ
Mar 11, 2024
Maintainer

yanshay
Mar 11, 2024
Author

bjoernQ
Mar 11, 2024
Maintainer

yanshay Mar 11, 2024
Author

bjoernQ Mar 11, 2024
Maintainer

yanshay Mar 11, 2024
Author

bjoernQ Mar 12, 2024
Maintainer

yanshay
Mar 12, 2024
Author

MabezDev
Mar 12, 2024
Maintainer

yanshay Mar 13, 2024
Author

MabezDev Mar 13, 2024
Maintainer

yanshay Mar 13, 2024
Author

bjoernQ Mar 13, 2024
Maintainer

MabezDev Mar 13, 2024
Maintainer

MabezDev
Mar 13, 2024
Maintainer

MabezDev Mar 13, 2024
Maintainer

MabezDev
Mar 14, 2024
Maintainer