PrivateKube is an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. A description of the project can be found on our webpage and in our OSDI'21 paper, titled Privacy Budget Scheduling (PDF locally available here and extended version available on arXiv).
This repository contains the artifact release for the OSDI paper:
- system: The PrivateKube system, which implements the privacy resource and a new scheduling algorithm for it, called Dominant Privacy Fairness (DPF).
- privatekube: A Python client for interaction with the PrivateKube system and performing macrobenchmark evaluation.
- simulator: A simulator for microbenchmarking privacy scheduling algorithms in tightly controlled settings.
- examples: Usage examples for various components, please refer its README for details.
- evaluation: Scripts to reproduce the macrobenchmark and microbenchmark evaluation results from our paper.
This section explains how to install the system and walks through a simple example of interaction with the privacy resource. It should take less than 30 mins to complete.
PrivateKube needs a Kubernetes cluster to run. If you don't have a cluster, you can install a lightweight Microk8s cluster on a decent laptop. Kubeflow requires more resources but it is not required in this section.
Below are the instructions to install and configure a lightweight cluster on Ubuntu. For other platforms, see https://microk8s.io/.
sudo snap install microk8s --classic
Check that it is running:
microk8s status --wait-ready
You can add your user to the microk8s
group if you don't want to type sudo
for every command (you should log out and log in again after this command):
sudo usermod -a -G microk8s $USER
mkdir ~/.kube
sudo chown -f -R $USER ~/.kube
(You can learn more about how to use Microk8s without sudo here)
You can now start and stop your cluster with:
microk8s start
microk8s stop
Export your configuration:
microk8s config > ~/.kube/config
Declare an alias to use kubectl
(you can add this line to your .bash_profile
or equivalent):
alias kubectl=microk8s.kubectl
Check that you can control your cluster:
kubectl get pods -A
Clone this repository on your machine. Our scripts will only affect this repository (e.g. dataset, logs, etc.) and your cluster, not the rest of your machine.
git clone https://github.com/columbia/PrivateKube.git
Enter the repository:
cd PrivateKube
All the other instructions in this file have to be run from this PrivateKube
directory, unless specified otherwise.
Create a new virtual environment to interact with PrivateKube, for instance with:
conda create -n privatekube python=3.8
conda activate privatekube
Install the dependencies:
pip install -r privatekube/requirements.txt
Install the PrivateKube package:
pip install -e privatekube
You can deploy PrivateKube in one line and directly by running:
source system/deploy.sh
If you prefer to understand what is going on, you can run the following commands one by one:
First, let's create a clean namespace to separate PrivateKube from the rest of the cluster:
kubectl create ns privatekube
Then, create the custom resources:
kubectl apply -f system/privacyresource/artifacts/privacy-budget-claim.yaml
kubectl apply -f system/privacyresource/artifacts/private-data-block.yaml
You can now interact with the privacy resource like with any other resource (e.g. pods). pb
is a short name for private data block, and pbc
stands for privacy claim. You can list blocks and see how much budget they have with: kubectl get pb -A
. So far, there are no blocks nor claims, but in the next section (1.3.) we will add some.
We already compiled the controllers and the scheduler and prepared a Kubernetes deployment that will pull them from DockerHub. Launch the privacy controllers and the scheduler:
kubectl apply -f system/dpfscheduler/manifests/cluster-role-binding.yaml
kubectl apply -f system/dpfscheduler/manifests/scheduler.yaml
There are additional instructions in the system directory if you want to modify the scheduler or run it locally.
Open a first terminal. We are going to monitor the logs of the scheduler to see it in action. Find the scheduler pod with:
kubectl get pods -A | grep scheduler
Then, in the same terminal, monitor the logs of the scheduler with something similar to:
kubectl logs --follow dpf-scheduler-5fb6886497-w7x49 -n privatekube
(alternatively, you can directly use: kubectl logs --follow "$(kubectl get pods -n privatekube | grep scheduler | awk -F ' ' '{print $1}')" -n privatekube
)
Open another terminal. We are going to create a block and a claim and see how they are being scheduled.
Create a new namespace for this example:
kubectl create ns privacy-example
Check that there are no datablocks or claims:
kubectl get pb -A
Add a first datablock:
kubectl apply -f examples/privacyresource/dpf-base/add-block.yaml
List the datablocks to see if you can see your new block:
kubectl get pb --namespace=privacy-example
Check the initial budget of your block:
kubectl describe pb/block-1 --namespace=privacy-example
Add a privacy claim:
kubectl apply -f examples/privacyresource/dpf-base/add-claim-1.yaml
Describe the claim:
kubectl describe pbc/claim-1 --namespace=privacy-example
On your first terminal, you should see that the scheduler detected the claim and is trying to allocate it. Wait a bit, and check the status of the claim again to check if it has been allocated. You can also check the status of the block again.
Finally, clean up:
kubectl delete -f examples/privacyresource/dpf-base/add-claim-1.yaml
kubectl delete -f examples/privacyresource/dpf-base/add-block.yaml
kubectl delete namespace privacy-example
We now have a proper abstraction to manage privacy as a native Kubernetes resource. The next section will provide an end-to-end example for how to interact with the privacy resource through a real machine learning pipeline. You can also refer to evaluation/macrobenchmark to reproduce part of our evaluation of this resource and the DPF algorithm we developed for it.
The examples/pipeline directory contains a step-by-step guide to build a DP ML pipeline with PrivateKube.
This simulator is used for prototyping and microbenchmark evaluation of privacy budget scheduling algorithms. It supports controlled evaluation of DPF against baseline algorithms, including round-robin and first-come-first-serve.
Install Conda, create and activate an isolated Python environment "ae".
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
eval "$($HOME/miniconda/bin/conda shell.bash hook)"
conda init
conda create -n ae -c conda-forge pypy3.6 pip python=3.6 seaborn notebook -y
conda activate ae
Install a Python package called dpsched via
cd ./simulator
pip install -r ./requirements.txt
pip install .[plot]
examples/simulator/minimal_example.py gives a quick start. There are two key concepts in the simulation program:
- The simulation model: This implements how different components in the systems behave and interact with each other. One can import it via
from dpsched import Top
- The configuration dictionary: a dictionary that specifies many aspects of the simulation behavior. for configuration details, please refer to the comments in minimal_example.py.
Basically, there are two steps in minimal_example.py.
- Preparing the config dictionary
- Calling
simulate(config, Top)
, whereconfig
is the config dict andTop
is the simulation model.
To run the minimal example:
cd ./examples/simulator
python ./minimal_example.py
or, replace CPython with PyPy for better performance:
cd ./examples/simulator
pypy ./minimal_example.py
The simulation program saves experiment results in a workspace specified by a config dictionary. By default, it is saved under ./examples/exp_results/some_work_space_name
.
dpsched.analysis
contains modules for collecting experiment result from workspace directory and plotting various figures.
evaluation/microbenchmark/microbenchmark_figures_single_block.ipynb gives examples on how to use the dpsched.analysis
module with detailed comments.
Instructions and code for how to use the simulator to reproduce the microbenchmark results in the PrivateKube paper are in evaluation/microbenchmark/README.md
.