only kernel stack frames reported inside orbstack docker container on macOS #170

tmm1 · 2024-09-29T12:31:36Z

i'm using orbstack on an arm64 mac to profile a linux app.

perf works as expected and shows me native symbols from user-land activity

but in devfiler i'm not seeing much come through. it seems connected, there are no errors, but there is very little data. i see some kernel symbols.

how can i debug this further?

The text was updated successfully, but these errors were encountered:

rockdaboot · 2024-09-30T06:37:40Z

but there is very little data

Is the samples timeline and/or the flamegraph empty or are you missing symbols?

If you see frames from your app without further information (symbols missing):

make sure your app is built with debug symbols
drag&drop you app into the devfiler window to extract symbols

If that doesn't help, you can enable the "dev" mode by double-clicking on the icon on the left of "devfiler" menu entry. You then see some more menu items. Check if you get gRPC messages and that DB stats show entries for TraceEvents, Stacktraces etc and let us know what you see.

tmm1 · 2024-09-30T06:52:37Z

the flamegraph is empty:

im' not seeing much data in the dev menus:

tmm1 · 2024-09-30T09:46:42Z

tried with a UTM.app VM and same thing. then i tried the old devfiler 0.6.0 and it started showing a lot more data! it still has those build-id errors, so i will downgrade to 7d2285e for now

EDIT: the old builds work in the VM but still not inside orbstack. will try inside docker in the VM next

tmm1 · 2024-09-30T10:17:51Z

in UTM it works on the host:

Linux ubuntu 6.2.0-39-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 23:07:44 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

but inside docker i get:

ERRO[0000] Failed to load eBPF tracer: failed to load eBPF code: your kernel version 6.2.0 is affected by a Linux kernel bug that can lead to system freezes, terminating host agent now to avoid triggering this bug

inside orbstack the kernel is newer:

Linux orbstack 6.10.11-orbstack-00280-g1304bd068592 #21 SMP Sat Sep 21 10:45:28 UTC 2024 aarch64 GNU/Linux

tmm1 · 2024-09-30T17:58:42Z

The issue seems specific to running within a container. Only kernel activity is shown. Reproduced via docker exec on Ubuntu VM.

rockdaboot · 2024-10-01T05:48:59Z

The issue seems specific to running within a container. Only kernel activity is shown. Reproduced via docker exec on Ubuntu VM.

Just to clarify, the profiler is a system-wide profiler, so it requires root privileges. Can you confirm that you run the docker container with --privileged?

rockdaboot · 2024-10-01T05:52:08Z

Ideally, try running the docker container with something like

docker run --privileged --pid=host -v /etc/machine-id:/etc/machine-id:ro \
-v /var/run/docker.sock:/var/run/docker.sock -v /sys/kernel/debug:/sys/kernel/debug:ro ...

tmm1 · 2024-10-01T05:53:17Z

Yes I was using --net=host --privileged=true but I will try those other options

tmm1 · 2024-10-01T06:08:25Z

Thanks, with those options it works in docker on Ubuntu. I see all the information from the host too which isn't ideal, but better than nothing.

In orbstack there's still only kernel-data, but I assume that's something specific to that environment.

rockdaboot · 2024-10-01T06:30:54Z

Thanks, with those options it works in docker on Ubuntu.

Thanks for testing.

I see all the information from the host too which isn't ideal, but better than nothing.

That's exactly what the profiler has been designed for: getting all information from the host while doing continuous profiling. Filtering is assumed to be done on the backend or by the user interface.

But if you think that limiting the view/collection of the profiler is a realistic use case, please open a separate issue with your ideas for discussion.

In orbstack there's still only kernel-data, but I assume that's something specific to that environment.

Maybe someone working on MacOS can chime in here.
Could you please reword the GH issue title and possibly provide profiler logs when starting with -v and/or -bpf-log-level 2?

leonard520 · 2024-10-28T13:48:10Z

I meet a similar issue on container running on K8s. There is only kernel stack frames. The base image of the container is ubuntu. The agent can work in a VM environment. In the VM, I am able to see other frames, like Java or Python.

I have grant the privileged to the container.

securityContext:
             allowPrivilegeEscalation: true
             capabilities:
               add:
               - CAP_SYS_ADMIN
             privileged: true

One thing to note is that the when the container is up and run ebpf-profiler for the first time, it will fail due to the below error

ERRO[0000] Failed to probe tracepoint: failed to get id for tracepoint: failed to read tracepoint ID for sys_enter_mmap: open /sys/kernel/debug/tracing/events/syscalls/sys_enter_mmap/id: no such file or directory

I fixed it by mount debugfs and tracefs

sudo mount -t debugfs none /sys/kernel/debug
sudo mount -t tracefs none /sys/kernel/debug/tracing

I attach the log in attachment with option -v
ebpf-profiler.log

rockdaboot · 2024-10-28T14:48:59Z

@leonard520 Regarding the "Failed to probe tracepoint": Can you please update to the latest ebpf-profiler? The tracepoint check has been dropped meanwhile.

leonard520 · 2024-10-28T16:31:54Z

@tmm1 I just tried the latest main branch. It still has the error "Failed to probe tracepoint". Does it come from here

rockdaboot · 2024-10-28T16:42:44Z

@tmm1 I just tried the latest main branch. It still has the error "Failed to probe tracepoint". Does it come from here

Sorry, my fault. The code change I was referring to is still in work: #175

leonard520 · 2024-10-29T02:03:37Z

@rockdaboot Do you have any clue why there is only kernel stack frames in my container environment? Feel free to let me know if you want to me to try something.

rockdaboot · 2024-10-29T17:40:33Z

@leonard520 I assume that for some reason the unwinder runs into an error. Ideally, we could reproduce this somehow on amd64 (can you?). @fabled, maybe you can have look at the above ebpf-profiler.log - I don't find anything in there that helps.

rockdaboot · 2024-10-29T17:57:59Z

@leonard520 I assume that you use devfiler for visualization. Can you run the profiler with -send-error-frames? I assume that you see the error frames in red directly under the root frame in the flamegraph. It tells you why the unwinding failed, hopefully that is a hint.

leonard520 · 2024-10-30T15:07:37Z

@rockdaboot Today I reproduce the issue again for a longer time. I notice there are a lot of log like below
DEBU[0006] Failed to get a cgroupv2 ID as container ID for PID 1006390: open /proc/1006390/cgroup: no such file or directory

I tried to do ps both in container and host worker node. The PID looks like in the host worker node instead of the container itself. As a result, the directory /proc/1006390/cgroup: only exists in node. I am wondering if this is a problem.

rockdaboot · 2024-10-30T16:28:02Z

I am wondering if this is a problem.

No this is not a problem. I made a test where I changed the code to look into cgroupxxx here, to trigger exactly this error every time. I still see all kinds of frames.

leonard520 · 2024-10-31T03:26:19Z

@rockdaboot Thanks for the verification. I did another test in container and record the below things. I am wondering how the profiler list all the PIDs to trace.

Check the PID for Java process in container

ps aux | grep java
root         **289**  9.5  0.1 2598068 101724 pts/2  Sl+  03:13   0:03 java -Dserver.port=8888 -jar demo.jar
root         313  0.0  0.0   6508  2244 pts/3    S+   03:13   0:00 grep --color=auto java

Check the PID for Java process in node

root     **1888084**  8.9  0.2 3596600 134912 ?      Sl+  03:13   0:05 java -Dserver.port=8888 -jar demo.jar

Check PID information in log. I am not able to find PID 289 but only for 1888084, however, for 1888084, I found the below messages. It looks to me that the PID can't be parsed.

DEBU[0056] => PID: 1888084
DEBU[0056] = PID: 1888084
DEBU[0056] - PID: 1888084
DEBU[0056] Skip process exit handling for unknown PID 1888084
DEBU[0057] => PID: 1888084
DEBU[0057] = PID: 1888084
DEBU[0057] - PID: 1888084
DEBU[0057] Skip process exit handling for unknown PID 1888084
DEBU[0058] Failed to get a cgroupv2 ID as container ID for PID 1888084: open /proc/1888084/cgroup: no such file or directory
DEBU[0058] => PID: 1888084
DEBU[0058] = PID: 1888084
DEBU[0058] - PID: 1888084
DEBU[0058] Skip process exit handling for unknown PID 1888084

Attach the full log for reference.
ebpf-bad.log

Gandem · 2024-10-31T08:22:42Z

@leonard520 If I understand correctly you're running the profiler in Kubernetes, right?

In that case, you will need to ensure the following is set:

      hostPID: true # Setting hostPID to true on the Pod so that the PID namespace is that of the host
      containers:
        ...
        securityContext:
          runAsUser: 0
          privileged: true # Running in privileged mode
          procMount: Unmasked # Setting procMount to Unmasked
          capabilities:
            add:
            - SYS_ADMIN # Adding SYS_ADMIN capability

Specifically, setting hostPID: true and procMount: Unmasked should ensure that the PIDs align between the container and the host.

leonard520 · 2024-10-31T14:41:07Z

@Gandem I think your answer clarified my confusion. Thank you very much. After trying to add your spec to my pod, I encountered this error.

INFO[0070] eBPF tracer loaded
ERRO[0080] Failed to handle mapping for PID 21720, file /pause: failed to extract interval data: failed to extract stack deltas from /pause: failure to parse golang stack deltas: failed to load .gopclntab section: EOF
INFO[0090] Attached tracer program
INFO[0090] Attached sched monitor
ERRO[0092] Request failed: rpc error: code = InvalidArgument desc = mapping is missing attributes
ERRO[0096] Request failed: rpc error: code = InvalidArgument desc = mapping is missing attributes

I am wondering if it is related with the pause container is written in go and I will take a further look.

On the other hand, I’m also considering whether this approach has security risks. Sharing the same PID namespace with the host reduces isolation and increases the potential for container escape.

felixge · 2024-11-14T14:13:19Z

In orbstack there's still only kernel-data, but I assume that's something specific to that environment.

I hit the same issue with orbstack. I suspect that issue is caused by the fact that:

OrbStack runs full-blown Linux machines that work almost exactly like traditional virtual machines

The word "almost" seems to refer to the fact that the VMs seem to actually be containers of their own. At least this is what I'm looking for signs of this inside of the Orb VM:

$ sudo cat /proc/1/environ
container=lxc

In practice this means that the PIDs seen by the VM are not the same PIDs as seen by the kernel, which breaks the eBPF profiler. I tried working around this by running the profiler via docker run --pid=host ..., but I wasn't able to make it work. I suspect the usage of host PIDs is currently not supported by OrbStack.

Anyway, I ended up firing up a real linux VM in the cloud. If somebody still figures out a way to make OrbStack work, that'd be great, but for now it should probably be considered an unsupported environment.

christos68k · 2024-11-15T18:08:18Z

@Gandem I think your answer clarified my confusion. Thank you very much. After trying to add your spec to my pod, I encountered this error.
ERRO[0092] Request failed: rpc error: code = InvalidArgument desc = mapping is missing attributes
ERRO[0096] Request failed: rpc error: code = InvalidArgument desc = mapping is missing attributes
I am wondering if it is related with the `pause` container is written in go and I will take a further look.

This is seems like OTLP profiling signal breakage (we're making lots of breaking changes). If you use the latest devfiler and compile an agent using 47e8410 it should work.

tmm1 changed the title ~~not getting user-level data from macOS orbstack container~~ not getting user-level data when running inside docker containers Sep 30, 2024

tmm1 mentioned this issue Sep 30, 2024

dctl install perf fails orbstack/orbstack#1484

Closed

tmm1 changed the title ~~not getting user-level data when running inside docker containers~~ only kernel stack frames reported inside orbstack docker container on macOS Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only kernel stack frames reported inside orbstack docker container on macOS #170

only kernel stack frames reported inside orbstack docker container on macOS #170

tmm1 commented Sep 29, 2024

rockdaboot commented Sep 30, 2024

tmm1 commented Sep 30, 2024

tmm1 commented Sep 30, 2024 •

edited

Loading

tmm1 commented Sep 30, 2024

tmm1 commented Sep 30, 2024

rockdaboot commented Oct 1, 2024

rockdaboot commented Oct 1, 2024

tmm1 commented Oct 1, 2024

tmm1 commented Oct 1, 2024

rockdaboot commented Oct 1, 2024 •

edited

Loading

leonard520 commented Oct 28, 2024

rockdaboot commented Oct 28, 2024

leonard520 commented Oct 28, 2024

rockdaboot commented Oct 28, 2024

leonard520 commented Oct 29, 2024 •

edited

Loading

rockdaboot commented Oct 29, 2024

rockdaboot commented Oct 29, 2024

leonard520 commented Oct 30, 2024

rockdaboot commented Oct 30, 2024

leonard520 commented Oct 31, 2024

Gandem commented Oct 31, 2024

leonard520 commented Oct 31, 2024

felixge commented Nov 14, 2024

christos68k commented Nov 15, 2024 •

edited

Loading

only kernel stack frames reported inside orbstack docker container on macOS #170

only kernel stack frames reported inside orbstack docker container on macOS #170

Comments

tmm1 commented Sep 29, 2024

rockdaboot commented Sep 30, 2024

tmm1 commented Sep 30, 2024

tmm1 commented Sep 30, 2024 • edited Loading

tmm1 commented Sep 30, 2024

tmm1 commented Sep 30, 2024

rockdaboot commented Oct 1, 2024

rockdaboot commented Oct 1, 2024

tmm1 commented Oct 1, 2024

tmm1 commented Oct 1, 2024

rockdaboot commented Oct 1, 2024 • edited Loading

leonard520 commented Oct 28, 2024

rockdaboot commented Oct 28, 2024

leonard520 commented Oct 28, 2024

rockdaboot commented Oct 28, 2024

leonard520 commented Oct 29, 2024 • edited Loading

rockdaboot commented Oct 29, 2024

rockdaboot commented Oct 29, 2024

leonard520 commented Oct 30, 2024

rockdaboot commented Oct 30, 2024

leonard520 commented Oct 31, 2024

Gandem commented Oct 31, 2024

leonard520 commented Oct 31, 2024

felixge commented Nov 14, 2024

christos68k commented Nov 15, 2024 • edited Loading

tmm1 commented Sep 30, 2024 •

edited

Loading

rockdaboot commented Oct 1, 2024 •

edited

Loading

leonard520 commented Oct 29, 2024 •

edited

Loading

christos68k commented Nov 15, 2024 •

edited

Loading