You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Under certain conditions (two examples detailed below), the python loader fails to resolve the autoTLSKey, which will cause python frames to be missing from the flame graph. Both examples are on amd64 and it's unclear whether this also potentially affects arm64 (TBD). Update: Only amd64 is impacted.
On amd64 default builds of python (without --enable-optimizations, --with-lto, which are not enabled by default https://docs.python.org/3/using/configure.html#performance-options), the calls to PyThread_tss_is_created and PyThread_tss_get are not inlined, so the value of autoTLSKey is stored in a register (%rbp) before being passed to both function calls:
(gdb) disassemble PyGILState_GetThisThreadState
Dump of assembler code for function PyGILState_GetThisThreadState:
0x000000000029c5d0 <+0>: mov 0x260381(%rip),%rax # 0x4fc9580x000000000029c5d7 <+7>: push %rbp
0x000000000029c5d8 <+8>: lea 0x608(%rax),%rbp
0x000000000029c5df <+15>: mov %rbp,%rdi
0x000000000029c5e2 <+18>: call 0x10afb0 <PyThread_tss_is_created@plt>
0x000000000029c5e7 <+23>: test %eax,%eax
0x000000000029c5e9 <+25>: je 0x29c5f8 <PyGILState_GetThisThreadState+40>
0x000000000029c5eb <+27>: mov %rbp,%rdi
0x000000000029c5ee <+30>: pop %rbp
0x000000000029c5ef <+31>: jmp 0x10f080 <PyThread_tss_get@plt>
0x000000000029c5f4 <+36>: nopl 0x0(%rax)
0x000000000029c5f8 <+40>: xor %eax,%eax
0x000000000029c5fa <+42>: pop %rbp
0x000000000029c5fb <+43>: ret
End of assembler dump.
This is reproducible by downloading and building python from source, and running the profiler next to a python process.
2. Python bundled with google-cloud-sdk
With Google Cloud SDK 502.0.0, on amd64, which bundles python 3.11, PyGILState_GetThisThreadState disassembles as follows (gdb -q /lib/google-cloud-sdk/platform/bundledpythonunix/lib/libpython3.11.so):
(gdb) disassemble PyGILState_GetThisThreadState
Dump of assembler code for function PyGILState_GetThisThreadState:
0x0000000000377d20 <+0>: mov 0x1129d61(%rip),%rax # 0x14a1a880x0000000000377d27 <+7>: cmpq $0x0,0x248(%rax)
0x0000000000377d2f <+15>: je 0x377d42 <PyGILState_GetThisThreadState+34>
0x0000000000377d31 <+17>: mov $0x250,%edi
0x0000000000377d36 <+22>: add 0x1129d4b(%rip),%rdi # 0x14a1a880x0000000000377d3d <+29>: jmp 0x20eba0 <PyThread_tss_get@plt>
0x0000000000377d42 <+34>: xor %eax,%eax
0x0000000000377d44 <+36>: ret
End of assembler dump.
The decode disassemble will find the mov $0x250,%edi instruction and assume that 0x250 is the address at which autoTSSKey will be found. However, this is modified later on by an ulterior add on %rdi.
Eventually, 0x250 will fail the distance check in:
// Check that the found value is within reasonable distance from the given symbol.
ifvalue>addrBase&&value<addrBase+4096 {
returnvalue
}
Since it 0x250 is an offset to PyRuntime rather than the absolute value of autoTSSKey.
Potential fixes
A couple potential fixes we though about:
a. Disassemble a different function: It's possible to fix (1.) (but not 2.) by alternatively disassembling PyGILState_Release, either as a fallback (see patch in DataDog@630fd97) or directly. Since python 3.0, and still as of python 3.13, this method directly calls PyThread_tss_get.
b. Add more logic in disassembly code: Add logic in decode_amd64.c to take into account variables in temporary registers, and the case where %rdi is populated via mov then add. We could potentially fix both issues at the cost of additionally complexity.
c. Use constant header values: Hardcode the offset of autoTSSKey from PyRuntime, with a different value depending on the python version.
Let me know whether there's any preference / different ideas as to how we should fix this 🙇 !
The text was updated successfully, but these errors were encountered:
What happened?
Under certain conditions (two examples detailed below), the python loader fails to resolve the autoTLSKey, which will cause python frames to be missing from the flame graph. Both examples are on amd64
and it's unclear whether this also potentially affects arm64 (TBD). Update: Only amd64 is impacted.The two examples are detailed below:
1. Python 3.12+ compiled from source without LTO
Starting with Python 3.12 (commit),
PyGILState_GetThisThreadState
callsPyThread_tss_is_created
first before callingPyThread_tss_get
, see: https://github.com/python/cpython/blob/3.12/Python/pystate.c#L2171-L2174On amd64 default builds of python (without
--enable-optimizations
,--with-lto
, which are not enabled by default https://docs.python.org/3/using/configure.html#performance-options), the calls toPyThread_tss_is_created
andPyThread_tss_get
are not inlined, so the value of autoTLSKey is stored in a register (%rbp
) before being passed to both function calls:This causes the decode disassembler to not find the value in the call instruction since this is not supported in https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/bd37e9c/interpreter/python/decode_amd64.c#L11-L20
This is reproducible by downloading and building python from source, and running the profiler next to a python process.
2. Python bundled with
google-cloud-sdk
With
Google Cloud SDK 502.0.0
, onamd64
, which bundles python 3.11,PyGILState_GetThisThreadState
disassembles as follows (gdb -q /lib/google-cloud-sdk/platform/bundledpythonunix/lib/libpython3.11.so
):The decode disassemble will find the
mov $0x250,%edi
instruction and assume that0x250
is the address at whichautoTSSKey
will be found. However, this is modified later on by an ulterioradd
on%rdi
.Eventually,
0x250
will fail the distance check in:opentelemetry-ebpf-profiler/interpreter/python/python.go
Lines 701 to 704 in bd37e9c
Since it
0x250
is an offset toPyRuntime
rather than the absolute value ofautoTSSKey
.Potential fixes
A couple potential fixes we though about:
a. Disassemble a different function: It's possible to fix (1.) (but not 2.) by alternatively disassembling
PyGILState_Release
, either as a fallback (see patch in DataDog@630fd97) or directly. Since python 3.0, and still as of python 3.13, this method directly callsPyThread_tss_get
.b. Add more logic in disassembly code: Add logic in
decode_amd64.c
to take into account variables in temporary registers, and the case where%rdi
is populated viamov
thenadd
. We could potentially fix both issues at the cost of additionally complexity.c. Use constant header values: Hardcode the offset of
autoTSSKey
fromPyRuntime
, with a different value depending on the python version.Let me know whether there's any preference / different ideas as to how we should fix this 🙇 !
The text was updated successfully, but these errors were encountered: