-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Launch cockpit-session via socket activation on /run/cockpit/session #16808
base: main
Are you sure you want to change the base?
Launch cockpit-session via socket activation on /run/cockpit/session #16808
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
aaefe97
to
7ebf9e6
Compare
This comment was marked as outdated.
This comment was marked as outdated.
7ebf9e6
to
1330a6a
Compare
1330a6a
to
ca43353
Compare
This comment was marked as resolved.
This comment was marked as resolved.
ca43353
to
9ed95a5
Compare
9ed95a5
to
6778062
Compare
6778062
to
f1e5a6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of optional minor/style issues here and a couple of fairly serious problems. I trust you to know the difference by now :)
unsigned char *buffer = malloc (size); | ||
/* We now have size equal to the number of bytes we need to return. Add an extra byte for a NUL terminator, | ||
* to be friendly to callers for text frames. */ | ||
unsigned char *buffer = calloc (1, size + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grumble grumble about the unnecessary zero-initialization. These authorize messages might be large! This is performance critical!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as large as a kilobyte for an authorize message, it'll take dozens of nanoseconds!! 😁
(more seriously: I pondered adding a \0
but the heck with it. All memory ought to be zeroed initially, especially in security critical stuff)
src/session/session-utils.c
Outdated
char *key_match; | ||
int key_match_len = asprintf (&key_match, "\"%s\":\"", key); | ||
if (key_match_len < 0) | ||
errx (EX, "out of memory"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can asprintf()
actually fail? Learn something new every day :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. This isn't alloca()
backed, but malloc()
backed. I think you're missing a free()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Also used our shiny asprintfx.
src/session/session-utils.c
Outdated
errx (EX, "out of memory"); | ||
|
||
const char *match = strstr (json, key_match); | ||
if (match) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Geschmacksache: I'd consider if match == NULL
as the error case here, leaving the main part of the body as the non-error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. Updated.
src/session/session-utils.c
Outdated
size_t value_len = strcspn (match, "\""); | ||
if (match[value_len] != '"') | ||
errx (EX, "syntax error, expected closing quote"); | ||
char *res = strndup (match, value_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for the extra variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from when I had an extra debug() in there 🙈 fixed.
|
||
char *get_authorize_key (const char *json, const char *key, bool required) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to change bool required
for const char *default
and hard-fail in the case that default = NULL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that works poorly with the "negotiate" case.
/* start time is the token at index 19 after the '(process name)' entry - since only this | ||
* field can contain the ')' character, search backwards for this to avoid malicious | ||
* processes trying to fool us; See proc_pid_stat(5) */ | ||
char *p = strrchr (buffer, ')'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't nul-terminate buffer
after the read. I'm a bit surprised that this didn't crash.
...well, I guess it's likely to hit a nul
at some point. The stack grows down, after all....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! Fixed.
unsigned long long start_time = strtoull (p, &endptr, 10); | ||
if (*endptr != ' ') | ||
errx (EX, "Failed to parse start time in /proc/pid/stat from %s", p); | ||
return start_time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the kernel so we don't have to worry about bounds-checking the returned value...
I think we can't have other error cases here. It's worth noting that strtoull
eats leading whitespace, so if the field were "empty" we'd potentially end up reading whatever the next field is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't -- this was failing initially, before I added the ++p; /* skip over the space */
above. I.e. it refused to parse " 1234".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can impose a meaningful bound here. https://man7.org/linux/man-pages/man5/proc_pid_stat.5.html makes no restriction other than monotonicity.
@@ -328,8 +377,37 @@ cockpit_session_client_certificate_map_user (const char *client_certificate_file | |||
return NULL; | |||
} | |||
|
|||
/* check if stdin is a Unix socket; then we are being called via [email protected] and need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use a pipe for the direct-spawn case? I forget and/or think we changed this at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't passing our activating socket on stdin considered by the systemd folks to be a bit gauche? I recall reading somewhere that they'd prefer we use fd 3 and an environment variable, or something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use a pipe for the direct-spawn case? I
Yes, g_spawn_async_with_pipes()
in session_start_process()
(unless it's an unix socket path).
src/session/client-certificate.c
Outdated
if (getsockopt (STDIN_FILENO, SOL_SOCKET, SO_PEERCRED, &ucred, &(socklen_t){sizeof ucred}) == 0) | ||
{ | ||
debug ("unix socket mode, reading cgroup from peer pid %d", ucred.pid); | ||
ws_cgroup_fd = open_proc_pid (ucred.pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is highly misleading: it doesn't have anything to do with cgroups (yet). It's an fd to the /proc/
entry for our peer process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to ws_pid_dirfd, similarly my_pid_dirfd.
src/session/client-certificate.c
Outdated
debug ("peer start time: %lu, my start time: %lu", ws_start_time, my_start_time); | ||
close (my_fd); | ||
|
||
/* guard against pid recycling: ws must have started earlier than ourselves */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe write a bit more about the attack scenario here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing we could do is alarm()
ourselves to prevent people from keeping long-open connections to us. That's highly suspicious and really only useful for the attack scenario outlined in the PR. The -ws should send the authorize message almost immediately, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe write a bit more about the attack scenario here.
It's in the commit message, but I put it into the comment as well.
alarm()
Ah, we have that on the ws side, but not in session. I'll check that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit which does the alarm() thing.
1e3eccd
to
24ddd16
Compare
b1ce12f
to
bf54d09
Compare
c6c1d4d
to
9747df2
Compare
The test idles on the login page for more than the 60s authorize timeout. When running cockpit-session via unix socket, this causes some unsightly "session timed out during authentication" messages. This ought to be handled better in cockpit-ws, but for the time being ignore these messages.
Stop failing with "didn't receive expected authorize message", as that's confusing -- the user did nothing wrong. Instead, silently exit successfully, and let cockpit-ws handle the timeout. This is mostly occurring when enabling Negotiate (kerberos) authentication, and thus cockpit-ws always starts *two* cockpit-session attempts (one for Negotiate, one for PAM), like in TestIPA.testQualifiedUsers.
This makes it easier for callers to treat the reply as textual string. The only remaining user is cockpit-session, where we will need this behaviour in the following commit.
We want to parse more fields than just "response" from the authorize message in the message. So remove the very rigidly harcoded parsing in `read_authorize_response()` and let it return the whole JSON string instead. Add a new `get_authorize_key()` function that parses a single value from that, and adjust the callers accordingly. This is a very restricted parser to keep things simple: No spaces in the structure, and no escaping. We can assume all that as cockpit-ws sends very controlled messages, and '"' isn't used in base64 values. In the worst case we get a truncated value, which will just fail authentication. Localize some variables while we are at it.
Drop the duplicated (but not identical!) implementation of `read_authorize_response()` and use the real implementation.
Run socket-activated cockpit-session with correct context. By default it is `init_t`, but that will produce a memfd with the wrong permissions, which the session cannot read. Allow cockpit-ws to connect to /run/cockpit/session. Allow restricted user_t and sysadm_t sessions to communicate (but not connect) to cockpit-ws through the session unix socket. (Covered by TestLogin.testSELinuxRestrictedUser)
…codes cockpit-session directly exits with 5 on authentication failures, including `authentication-unavailable`; or with 127 on authentication timeouts. It also forwards the bridge child's non-zero exit status, which is 1 in the common case of e.g. an unhandled exception. None of these are a reason to mark the unit as failed.
The point of this is really to determine the cgroup of our caller, i.e. the cockpit-ws process. With [email protected] this is different than cockpit-session's own cgroup. Rename the variables to clarify this.
If cockpit-ws directly spawns cockpit-session, it can process the 127 exit code by itself. But there is no exit code if that happens via unix socket. Check this condition explicitly and report `no-cockpit` via the protocol. This triggers a more specific error path in cockpit-ws and the login page, adjust TestConnection.testBasic accordingly.
The various perform_*() functions all assume a non-NULL rhost, as several functions such as `btmp_log()` do unchecked strncpy() on them. Add assertions to (1) make them fail more gracefully and usefully), and (2) document that requirement more explicitly. This is currently guaranteed by having an explicit fallback to `""` if `$COCKPIT_REMOTE_PEER` isn't set.
cockpit_session_launch() doesn't set this env if the remote host is unknown. That is never the case in practice at least in our tests, but callers should still be aware of it.
Passing the remote peer from ws to cockpit-session via the `$COCKPIT_REMOTE_PEER` environment variable does not work in unix socket mode. So make that part of the protocol instead and attach it to the authorize response.
When cockpit-session's stdin is a Unix socket, it is being spawned by cockpit-ws through [email protected]. In that case it doesn't make sense to look at its own cgroup, but we need to check the cgroup of the socket peer (i.e. cockpit-ws). We must guard against PID recycling attacks: 1. Eve logs into cockpit, gets ws pid E, and hacks ws: connect to session, forks, keeps the session fd in a different process, and kills pid E. 3. Eve waits until Alice logs in again and happens to get ws pid E (which can happen with a sufficient number of forks, social engineering, and some luck). cockpit-session checks that pid E is in cgroup /cockpit/alice, and starts an alice session for Eve's ws. (Note: SO_PEERCRED gives you pid/uid/gid at the time connect() was made.) Thus require that the peer (ws) must have started earlier than cockpit-session. This is the same approach that polkit uses as a fallback if pidfds are not available: https://github.com/polkit-org/polkit/blob/main/src/polkit/polkitunixprocess.c Note that pidfds don't help us: There is no API to directly get from a pidfd to a cgroup, startup time, or /proc/<pid> dirfd, this has to happen via `pidfd_getpid()` and opening /proc/pid. But that's exactly what we want to avoid, and thus is pointless (they are also only available since kernel 6.5).
ws times out the authorization attempt after one minute (`cockpit_ws_auth_response_timeout`). Introduce a similar timeout on the session side. This makes PID recycling attacks harder, as their victim now has to log in within one minute.
Just like in cockpit-wsinstance-*.socket
Unless it's otherwise specified in the configuration file, we now spawn cockpit-session by connecting to /run/cockpit/session if that exists. Fall back to calling cockpit-session directly for custom setups. We leave the cockpit_ws_session_program variable in place to allow the tests to override things. Update the unit files for cockpit-ws to ensure that the socket is available when cockpit-ws is running. Adjust TestConnection.testBasic accordingly: When running cockpit-session via unix socket activation, its group permissions are irrelevant. More thoroughly move the binary away and also disable the socket, to fail both of cockpit-ws' session creation attempts. Co-Authored-By: Martin Pitt <[email protected]>
systemd spawns this for us now, so we don't need the setuid bit anymore. Clean up the statoverride in the Debian packaging on upgrades. However, that means that cockpit-ws cannot be run as `cockpit-wsinstance` user outside of the unit any more. Adjust our tests to run it as root instead.
9747df2
to
52da73b
Compare
This is a precondition for our goal of removing the static cockpit users in #16811
A nice side effect of this is that we can now connect to unix sockets from cockpitauth, which is useful for https://github.com/allisonkarlitskaya/cockpit-cloud
read()
call (no streamFILE*
any more)pass memfd with login information from ws to sessionnot required. That information comes from cockpit-session to cockpit-bridge, which is unaffected by this change.$COCKPIT_REMOTE_PEER
from ws to sessioncockpit-session@<id>.service
fails with error 5/NOTINSTALLED initially (e.g.TestTerminal.testBasic
)TestConnection.testBasic
)Fixes #21201