Replies: 5 comments
-
Hey @mwflaher, based on own experience with troubleshooting self-hosted runners as well as the official documentation, job logs - so the logs you see streamed in the web ui - can be found inside the running runner pod / container in the The logs generated are not just one file, but split into multiple files (by steps? not sure, can't look it up right now). They are also not updated "live", but written in chunks. To ingest the logs into any kind of log aggregation system, you probably would need to build a custom solution. I haven't yet seen anything useful for this. |
Beta Was this translation helpful? Give feedback.
-
We also are having issues finding the STDOUT runner handled process logs in our kubernetes logs and opentelemetry using the recommended ARC setup for ephemeral runners. Our scaleset has multiple containers along the runner, all container logs arrive to our downstream.
We created a test java application that only writes a line to System.out and are running it with the GitHub Action gradle. The runner logs themselves arrive just fine and the GitHub UI shows the STDOUT outputted lines, but anything handled by the
Comes from https://github.com/actions/runner/blob/main/src/Runner.Sdk/ProcessInvoker.cs#L872 We don't allow privileged containers in our kubernetes setup, but still for testing we added the privileged capability and permission to elevate. The exception IO Exception still shows in the logs. We confirmed we're able to elevate runner user privileges in the container and write to the OOM file. Trying to run the container as root threw an error "Must not run interactively with sudo"
I understand concerns expressed related to performance https://github.com/actions/runner/blob/main/src/Runner.Sdk/ProcessInvoker.cs#L20 but somehow this feels to me like a broken implementation. Kubernetes core recommends writing all container output to STDOUT for consolidation https://kubernetes.io/docs/concepts/cluster-administration/logging/ , we should be able to transparently see our workload logs as handled by the runner somehow. Is there any way the ProcessInvoker can be setup to avoid OOM Score Adjustments on a kubernetes ARC setup with scalesets? This will avoid running privileged workloads which would be security compliant. Otherwise, is there a privileged setting that the runner can leverage to properly output STDOUT? |
Beta Was this translation helpful? Give feedback.
-
I recently solved this with promtail. Our team runs github actions runners on our own karpenter-controlled nodegroups. Our standard promtail config will use the kubernetes service discovery functionality to read logs from the pods themselves. But github actions runners don't report anything particularly useful to stdout. Folks in this thread already know that the behind-the-scenes coordination of workers happens in So we need to get access to the filesystem of the runner pods from within another pod on that host. In our particular setup, the runner pods have I've chosen to map So, now with the actions runner pods filesystems mounted into our promtail pod, we can configure promtail like this: - job_name: gha-jobs
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_annotationpresent_actions_runner_id
action: keep
regex: 'true'
- source_labels:
- __meta_kubernetes_pod_uid
action: replace
replacement: >-
/var/log/kubelet-pods/$1/volumes/kubernetes.io~empty-dir/runner/_diag/pages/*.log
target_label: __path__
pipeline_stages:
- regex:
expression: ^(?P<time>\S+T\S+Z?) (?P<message>.*)$
- timestamp:
source: time
format: RFC3339Nano
- output:
source: message
- regex:
expression: >-
runner/_diag/pages/(?P<timeline_id>[^_]+?)_(?P<job_id>[^_]+?)_.*.log$
source: filename
- labels:
job_id: ''
timeline_id: ''
- job_name: gha-runners
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_annotationpresent_actions_runner_id
action: keep
regex: 'true'
- source_labels:
- __meta_kubernetes_pod_uid
action: replace
replacement: >-
/var/log/kubelet-pods/$1/volumes/kubernetes.io~empty-dir/runner/_diag/{Worker,Runner}_*.log
target_label: __path__
pipeline_stages:
- multiline:
firstline: ^\[\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}Z \w+\s+\w+\]
max_wait_time: 3s
- regex:
expression: >-
^\[(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})Z
(?P<level>\w+)\s+(?P<subsystem>\w+)\] (?P<message>(?s:.*))$
- timestamp:
source: time
format: '2006-01-02 15:04:05'
- template:
source: message
template: >-
{{if .level}}[{{ .level }} {{ .subsystem }}] {{ .message
}}{{else}}{{ .Entry }}{{end}}
- output:
source: message I'm going to go over each segment in order: - job_name: gha-jobs
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_annotationpresent_actions_runner_id
action: keep
regex: 'true' Both jobs are configured to only run on github actions runners. Anything without that annotation gets skipped. - source_labels:
- __meta_kubernetes_pod_uid
action: replace
replacement: >-
/var/log/kubelet-pods/$1/volumes/kubernetes.io~empty-dir/runner/_diag/pages/*.log
target_label: __path__ By using the pod uid in our regex template here, we're being explicit about only reading log files from github actions runner pods. And we're going to be able to associate this data with other information we have. pipeline_stages:
- regex:
expression: ^(?P<time>\S+T\S+Z?) (?P<message>.*)$
- timestamp:
source: time
format: RFC3339Nano
- output:
source: message This extracts the timestamp from the text of the message. We then only retain the message itself for the log entry. This makes logs easier to read on the other end. Allowing us to represent time in a browser-selected format when viewing. - regex:
expression: >-
runner/_diag/pages/(?P<timeline_id>[^_]+?)_(?P<job_id>[^_]+?)_.*.log$
source: filename
- labels:
job_id: ''
timeline_id: '' This pulls additional metadata from the filename itself. https://github.com/actions/runner/blob/6d7446a45ebc638a842895d5742d6cf9afa3b66d/src/Runner.Common/Logging.cs#L127 calls this The first few bits of the next job are the same as the first. I'm not going to repeat them. Here's what's next: pipeline_stages:
- multiline:
firstline: ^\[\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}Z \w+\s+\w+\]
max_wait_time: 3s
- regex:
expression: >-
^\[(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})Z
(?P<level>\w+)\s+(?P<subsystem>\w+)\] (?P<message>(?s:.*))$ The worker logs have a different format. And more importantly, they often have multiline stack traces. This pair of instructions ensures that we extract the timestamp and other info from the log message, and also correctly group stack traces into one message. - template:
source: message
template: >-
{{if .level}}[{{ .level }} {{ .subsystem }}] {{ .message
}}{{else}}{{ .Entry }}{{end}}
- output:
source: message This reformats the log messages and drops the timestamp. For the same reasons we chose above. When I was working, I found some messages that did not match the regex I specified. This includes a failsafe that allows the log entry to be emitted if our regex did not match for some reason. We still have a few more tweaks to apply. I'm going to work with one of my teammates to attempt to find some correlating labels that may allow us to follow a workflow or job by id more completely. Anyway I hope this helps someone somehow. I realize it's kind of implementation specific. But there's details in here that might get you over a hump you have. |
Beta Was this translation helpful? Give feedback.
-
@thekuffs thank you for the info, how did you manage to match logs with a particular run? I need to assign a proper labels to logs like repository, workflow name, run etc. |
Beta Was this translation helpful? Give feedback.
-
For the time being, I don't. My internal requirement was to get things logged and archived. I would have liked to get that other correlating info tied together. But I didn't see a trivial solution. So, there's some information encoded in the filename as I mentioned (https://github.com/actions/runner/blob/6d7446a45ebc638a842895d5742d6cf9afa3b66d/src/Runner.Common/Logging.cs#L127) but I wasn't able to correlate those ids with anything the user sees. I can't remember which log has it exactly, but one of them dumps a big JSON blob at the beginning of the run. It contains the uuids involved, the repository, I think the user, and a bunch of other information. The problem is, there's no way for me to configure promtail to capture that block as a sort of context for the rest of the file. I think I'd have to start exploring a custom log parser to really make it work. And I just don't have that kind of time for such a solution. |
Beta Was this translation helpful? Give feedback.
-
We're flowing logs (and metrics) from our installation into opentelemetry and would like to see them downstream, but are struggling to figure out how to get the stdout/stderr that we see in the github actions UX. Is this possible, or is something redirecting logs in a way that we can't hook into? Thanks!
Did a search first but didn't find an existing discussion. This discussion confirms that they are not there: #2054 but perhaps the reasoning is this redaction concern?
Beta Was this translation helpful? Give feedback.
All reactions