Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch secrets management to cryptic. #25

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
token.env
image/secrets.private.key
agents/token.env
image/agent.pub
image/agent.key
53 changes: 22 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ This repository contains resources related to the JuliaGPU Buildkite CI infrastr

- a custom Ubuntu-based image that can be based off of another image (e.g.
CUDA's images);
- support for encrypted environment variables in the pipeline, and an
environment hook to decode them;
- integration with [cryptic](https://github.com/staticfloat/cryptic-buildkite-plugin)
to support encrypted secrets in the pipeline;
- Docker Compose templates and systemd service files to tie everything together,
and give each job a safe and reproducible execution environment.

Expand Down Expand Up @@ -69,21 +69,34 @@ Next, a BuildKite admin should set-up a pipeline for your repository:
a. Under general settings, make the pipeline public by clicking the big green button.

b. Under GitHub settings

- check the box to `Build pull requests from third-party forked repositories`
- check the box to `Build tags`
- set the branch filter to `master v*` (and other branches you want to run CI for,
e.g., `release-*`)

For integration with [cryptic](https://github.com/staticfloat/cryptic-buildkite-plugin),
a repository key should be generated and committed:

```
$ cd cryptic-buildkite-plugin
$ ./bin/create_repo_key --public-key=agent.pub \
--private=key=agent.key \
--repo-root=$REPO_CHECKOUT

$ cd $REPO_CHECKOUT
$ git add .buildkite && git commit -m "Add cryptic repository keys."
```

Refer to the cryptic documentation for more details.


### Steps for developers

Finally, you should create `.buildkite/pipeline.yml` in your repository with the steps to
You should create `.buildkite/pipeline.yml` in your repository with the steps to
perform GPU CI. Start from the following template:

```yaml
env:
SECRET_CODECOV_TOKEN: "..."

steps:
- label: "Julia v1"
plugins:
Expand All @@ -106,32 +119,10 @@ by the `julia` plugin) and/or instantiates your project (done by the `julia-test
If you need to send resources across steps, use
[artifacts](https://buildkite.com/docs/pipelines/artifacts).

For coverage submission to Codecov to work, you need to encrypt your `CODECOV_TOKEN` and
specify it as a global `SECRET_CODECOV_TOKEN` (see below).
For coverage submission to Codecov to work, you need to encrypt your `CODECOV_TOKEN` (see below).



## Using secrets

During start-up, agents will scan for `SECRET_` environment variables and decrypt their
contents for use in the rest of the pipeline. If you want to use this mechanism to provide,
say, a secret `CODECOV_TOKEN`, run the `encrypt` script in this repository and follow its
prompts:


```
$ ./tools/encrypt
Variable name: CODECOV_TOKEN
Secret value:

Use the following snippet in your pipeline.yml:

env:
SECRET_CODECOV_TOKEN: "kaIXEN51HinaQ4JGclQcIgxeMMtXDb5uvnP3E2eKrH4Eruf2pKd5QwUGcIVL8+rcWeo5FWj883rNxRQEH3YeCWs6/i7vzs+ORvG51QeCNYQgNqFzPsWRcq5qJYc+JPFbisS7q9nghqWTwr52cnjarD4Xx3ceGorMyS5NvFpCNxMgqHNyGkLvipxcTTJfKZK61bpnbntoIjiIO1XSZKjcxnXFGFnolV9BHCr5v8f7F42n2tUH7X3nDHmTBr1AbO2lFAU9ra/KezHcIf0wg2HcV8LZD0+mj8q/SBPjQZSH7cxwx4Q2eTjT4Sw7xnrBGuySVm8ZPCAV7nRNEHo+VqR+GQ=="
```

If your version of OpenSSL is too old, the `./tools/encrypt` script may fail.
In that case, you can run it inside Docker:
```bash
docker run --rm -it -v $(pwd):/root ubuntu bash -c 'apt update && apt install -y openssl && /root/tools/encrypt'
```
TODO
7 changes: 4 additions & 3 deletions agents/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,14 @@ multiple GBs of VRAM, and each Julia process also consumes multiple GBs of
system memory).

On the agent host, clone this repository and add an appropriate `token.env` and
`secrets.private.key`(these files are not part of the repository for obvious reasons):
`agent.pub`/`agent.key` keys to respectively the `agents` and `image` directories
(these files are not part of the repository for obvious reasons):

```
# git clone https://github.com/JuliaGPU/buildkite /etc/buildkite
# ...
# chown root:root agents/token.env image/secrets.private.key
# chmod 600 agents/token.env image/secrets.private.key
# chown root:root agents/token.env image/agent.pub image/agent.key
# chmod 600 agents/token.env image/agent.pub image/agent.key
```

Make sure a recent version of `docker-compose` is available:
Expand Down
13 changes: 9 additions & 4 deletions image/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@


ARG base=ubuntu:latest
FROM $base

# install dependencies for Buildkite and its plugins
# (notably julia-buildkite-plugin and cryptic-buildkite-plugin)
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y curl wget apt-transport-https openssl gnupg2 python3
DEBIAN_FRONTEND=noninteractive apt-get install -y \
curl wget apt-transport-https gnupg2 python3 python3-pip openssl jq && \
pip install --no-cache-dir shyaml

RUN wget -O- https://keys.openpgp.org/vks/v1/by-fingerprint/32A37959C2FA5C3C99EFBC32A79206696452D198 > \
/tmp/buildkite-agent-archive-keyring.gpg && \
Expand All @@ -15,7 +17,10 @@ RUN wget -O- https://keys.openpgp.org/vks/v1/by-fingerprint/32A37959C2FA5C3C99EF
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y buildkite-agent

COPY secrets.private.key /
RUN rm -rf /var/lib/apt/lists/*

COPY agent.pub agent.key /secrets/

COPY hooks /hooks

ENTRYPOINT ["buildkite-agent"]
Expand Down
203 changes: 180 additions & 23 deletions image/hooks/environment
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,33 +1,190 @@
#!/bin/bash
set -euo pipefail

# this is the agent environment hook of the cryptic Buildkite plugin
# https://github.com/staticfloat/cryptic-buildkite-plugin

# only decrypt secrets if this is a trusted environment (i.e., not a third-party PR)
if [[ "${BUILDKITE_PULL_REQUEST}" == "false" ||
"${BUILDKITE_PULL_REQUEST_REPO}" == "${BUILDKITE_REPO}" ]]; then
echo "--- :key: Decrypting secrets"
set -eou pipefail

for value_var in $(set | awk -F "=" '{print $1}' | grep "SECRET_"); do
name=${value_var:7}
echo "Found secret for $name"
secret=${!value_var}
## A foundational concept for the usage of this hook is that we can deny access to secrets
## (such as the agent private key or the build token file) by deleting or unmounting the
## `/secrets` folder. This agent should always be running within some kind of sandbox,
## whether that be a Docker container or whatever. When the job is finished, the buildkite
## agent should exit, causing the docker container to restart and restore the deleted files.
## This gives us the capability to deny access to these files to later steps within the
## current buildkite job.
SECRETS_MOUNT_POINT="${BUILDKITE_PLUIGIN_CRYPYTIC_SECRETS_MOUNT_POINT:-/secrets}"

encrypted_key=$(echo "$secret" | cut -d ';' -f 1)
keyfile=$(mktemp)
echo "$encrypted_key" |
openssl base64 -d -A |
openssl rsautl -decrypt -inkey /secrets.private.key -out "$keyfile"
## The secrets that must be contained within:
## - `agent.{key,pub}`: An RSA private/public keypair (typically generated via the
## script `bin/create_agent_keypair`). See the top-level `README.md` for more.
## - `buildkite-api-token`: A buildkite API token (with `read_builds` permission).

encrypted_value=$(echo "$secret" | cut -d ';' -f 2)
value=$(echo "$encrypted_value" |
openssl base64 -d -A |
openssl enc -d -aes-256-cbc -pbkdf2 -iter 100000 -pass "file:$keyfile")
## The helper programs that must be available on the worker:
## - openssl v3 (from Homebrew on macOS)
## - shred (Linux only)
## - shyaml
## - jq

export "$name"="$value"
shred -u "$keyfile"
done
# Helper function
function die() {
echo "ERROR: ${1}" >&2
buildkite-agent annotate --style=error "${1}"
exit 1
}

function base64dec() {
tr -d '\n' | openssl base64 -d -A
}

function base64enc() {
openssl base64 -e -A
}

if [[ -n "$(which shred 2>/dev/null)" ]]; then
function secure_delete() {
shred -u "$*"
}
elif [[ "$(uname)" == "Darwin" ]] || [[ "$(uname)" == *BSD ]]; then
function secure_delete() {
rm -fP "$*"
}
else
# Suboptimal, but what you gonna do?
function secure_delete() {
rm -f "$*"
}
fi

function cleanup_secrets() {
## Cleanup: Deny access to secrets to future pipeline steps by either unmounting `/secrets`
# or just deleting the files inside, if that doesn't work. If neither work, we abort the build.
if ! umount "${SECRETS_MOUNT_POINT}" 2>/dev/null; then
if ! rm -rf "${SECRETS_MOUNT_POINT}"; then
die "Unable to unmount secrets at '${SECRETS_MOUNT_POINT}'! Aborting build!"
fi
fi

# don't pollute the global namespace
unset SECRETS_MOUNT_POINT BUILDKITE_TOKEN_PATH BUILDKITE_TOKEN AGENT_PRIVATE_KEY_PATH ADHOC_PAIR
}

# No matter how we exit, make sure we cleanup our secrets
trap "cleanup_secrets" EXIT

# Set this to wherever your private key lives
AGENT_PRIVATE_KEY_PATH="${SECRETS_MOUNT_POINT}/agent.key"
AGENT_PUBLIC_KEY_PATH="${SECRETS_MOUNT_POINT}/agent.pub"
if [[ ! -f "${AGENT_PRIVATE_KEY_PATH}" ]]; then
die "Unable to open agent private key path '${AGENT_PRIVATE_KEY_PATH}'! Make sure your agent has this file deployed within it!"
else
if ! openssl rsa -inform PEM -in "${AGENT_PRIVATE_KEY_PATH}" -noout 2>/dev/null; then
die "Secret private key path '${AGENT_PRIVATE_KEY_PATH}' is not a valid private RSA key!"
fi
fi
if [[ ! -f "${AGENT_PUBLIC_KEY_PATH}" ]]; then
die "Unable to open agent public key path '${AGENT_PUBLIC_KEY_PATH}'! Make sure your agent has this file deployed within it!"
else
echo "--- :key: Skipping decryption of secrets"
if ! openssl rsa -inform PEM -pubin -in "${AGENT_PUBLIC_KEY_PATH}" -noout 2>/dev/null; then
die "Secret public key path '${AGENT_PUBLIC_KEY_PATH}' is not a valid public RSA key!"
fi
fi

rm /secrets.private.key
# Create a buildkite token with `read_builds` permissions, paste it in here.
BUILDKITE_TOKEN_PATH="${SECRETS_MOUNT_POINT}/buildkite-api-token"
if [[ ! -f "${BUILDKITE_TOKEN_PATH}" ]]; then
die "Unable to open buildkite token path '${BUILDKITE_TOKEN_PATH}'! Make sure your agent has this file deployed within it! "
fi
BUILDKITE_TOKEN="$(cat "${BUILDKITE_TOKEN_PATH}")"
if ! [[ "${BUILDKITE_TOKEN}" =~ ^[[:xdigit:]]{40}$ ]]; then
die "Buildkite token stored at '${BUILDKITE_TOKEN_PATH}' is not a 40-length hexadecimal hash!"
fi

function is_uuid() {
[[ "${1}" =~ ^[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}$ ]]
}

# Helper function to get the first job ID from the currently-running build
function get_initial_job_id() {
local TOKEN_HEADER="Authorization: Bearer ${BUILDKITE_TOKEN}"
local URL="https://api.buildkite.com/v2/organizations/${BUILDKITE_ORGANIZATION_SLUG}/pipelines/${BUILDKITE_PIPELINE_SLUG}/builds/${BUILDKITE_BUILD_NUMBER}"

local CURL_OUTPUT=""
local CURL_UUID=""
for idx in 1 2 3; do
CURL_OUTPUT="$(curl -sfL -H "${TOKEN_HEADER}" "${URL}" || true)"
CURL_UUID="$(jq '.jobs[0].id' <<<"${CURL_OUTPUT}" | tr -d '"')"
if is_uuid "${CURL_UUID}"; then
echo -n "${CURL_UUID}"
return
fi
echo "ERROR: Initial job ID output invalid:\n${CURL_OUTPUT}" >&2
echo "Retrying up to $((3 - $idx)) more times before failing out..." >&2
done
die "Initial job ID does not look like a UUID: '${CURL_UUID}'"
}

export BUILDKITE_INITIAL_JOB_ID="$(get_initial_job_id)"
function set_cryptic_privileged() {
# The first thing we do is export a base64-encoded form of the keys for later consumption by the cryptic plugin
echo "Privileged build detected; unlocking private key"
export BUILDKITE_PLUGIN_CRYPTIC_BASE64_AGENT_PRIVATE_KEY_SECRET="$(base64enc < "${AGENT_PRIVATE_KEY_PATH}")"
export BUILDKITE_PLUGIN_CRYPTIC_BASE64_AGENT_PUBLIC_KEY_SECRET="$(base64enc < "${AGENT_PUBLIC_KEY_PATH}")"
export BUILDKITE_PLUGIN_CRYPTIC_PRIVILEGED=true

# The next thing we do is search for `CRYPTIC_ADHOC_SECRET_*` variables and decrypt them.
# These should only be used for things like SSH keys, which need to be decrypted before we
# even have a chance to check out the repository.
for LONG_ADHOC_NAME in $(set | cut -d"=" -f 1 | grep -E "^CRYPTIC_ADHOC_SECRET_[^ ]+"); do
EXPORTED_NAME="${LONG_ADHOC_NAME:21}"
echo " --> Decrypting ad-hoc secret ${EXPORTED_NAME}"

# No matter what happens, this file dies when we leave
local TEMP_KEYFILE=$(mktemp)
OLD_TRAP="$(trap -p EXIT)"
trap "rm -f ${TEMP_KEYFILE}" EXIT

# Use `readarray` to split our combined key/value envvar
readarray -d';' -t ADHOC_PAIR <<<"${!LONG_ADHOC_NAME}"

# Take the key, decrypt it with our RSA private key
base64dec <<<"${ADHOC_PAIR[0]}" | openssl rsautl -decrypt -inkey "${AGENT_PRIVATE_KEY_PATH}" > "${TEMP_KEYFILE}"

# Make sure the AES key is the right length
if [[ $(wc -c <"${TEMP_KEYFILE}") != "128" ]]; then
die "Invalid AES key embedded in ad-hoc secret '${EXPORTED_NAME}', counted '$(wc -c <"${TEMP_KEYFILE}")' bytes instead of 128!"
fi

export "${EXPORTED_NAME}"="$(base64dec <<<"${ADHOC_PAIR[1]}" | openssl enc -d -aes-256-cbc -pbkdf2 -iter 100000 -pass "file:${TEMP_KEYFILE}")"

# Clean up our keyfile and our trap
secure_delete "${TEMP_KEYFILE}"
eval "${OLD_TRAP}"
unset ADHOC_PAIR
done
}

# Now that we have our keys and our buildkite token, we decide whether the keys should be exported into
# the environment or not. We only do this if one of two conditions are met:
#
# - If we are the first job to run in this build, we are automatically authorized, as the first job is defined
# within the WebUI, so it is assumed secure from drive-by pull requests.
# - If we have an environment variable (`BUILDKITE_PLUGIN_CRYPTIC_BASE64_SIGNED_JOB_ID_SECRET`) and it correctly
# verifies as a signature on the initial job ID, we consider ourselves a launched child pipeline

if [[ "${BUILDKITE_JOB_ID}" == "${BUILDKITE_INITIAL_JOB_ID}" ]]; then
set_cryptic_privileged
elif [[ -v "BUILDKITE_PLUGIN_CRYPTIC_BASE64_SIGNED_JOB_ID_SECRET" ]]; then
# Decode the base64-encoded signature and dump it to a file
SIGNATURE_FILE="$(mktemp)"
OLD_TRAP="$(trap -p EXIT)"
trap "rm -f ${SIGNATURE_FILE}" EXIT
openssl base64 -d -A <<<"${BUILDKITE_PLUGIN_CRYPTIC_BASE64_SIGNED_JOB_ID_SECRET}" > "${SIGNATURE_FILE}"

# Verify that the signature is valid; if it is, then unlock the keys!
if openssl dgst -sha256 -verify "${AGENT_PUBLIC_KEY_PATH}" -signature "${SIGNATURE_FILE}" <<<"${BUILDKITE_INITIAL_JOB_ID}"; then
set_cryptic_privileged
fi

rm -f "${SIGNATURE_FILE}"
eval "${OLD_TRAP}"
fi
9 changes: 0 additions & 9 deletions image/secrets.public.key

This file was deleted.

34 changes: 0 additions & 34 deletions tools/encrypt

This file was deleted.