Experimental plugin for scikit-learn to be able to run (some) estimators on Intel GPUs
via numba-dpex. Support for other GPU
constructors is also on the roadmap and depends on progress of interoperability features
of the numba-dpex
stack.
This package requires working with the following branch of scikit-learn:
feature/engine-api
branch on https://github.com/scikit-learn/scikit-learn
A step-by-step guide is provided in this README for installing numba-dpex
, along with
the feature/engine-api
branch of scikit-learn
and this plugin from source.
🚧 TODO: at the moment it is unusually complicated to install this plugin and its
dependencies. Once feature/engine-api
is merge and released in scikit-learn and we aim
to make it possible to install sklearn-numba-dpex
and all its dependencies with a one
liner conda install
command or pip install
or both.
sklearn.cluster.KMeans
for the standard LLoyd's algorithm on dense data arrays, includingkmeans++
support.
Getting started requires a working environment for using numba_dpex
. Currently a
conda install or a docker image
are available.
⚠⚠⚠ WARNING ⚠⚠⚠: latest numba_dpex
releases might have stability issues. If you
run into segfaults or wrong outputs, try disabling JIT compilation optimizations by
setting the environment variable NUMBA_DPEX_OPT=0
. If you discover minimal
reproducers for such stability issues, please report at
numba_dpex
issue tracker.
Conda does not currently support installation of the low-level runtime libraries for GPUs, so the first part of the installation guide consists in installing those libraries on the host system.
The second part consists in running conda commands that create the environment with
all the required packages and configuration. Note that the installation logic is a bit
complicated since it mixes packages from several conda channels conda-forge
,
dppy/label/dev
, and intel
, some of which being experimental. Neither the builds nor
the channels are maintained by the sklearn_numba_dpex
team and their level of
stability is unknown.
🚧 TODO: update the instructions to install everything from non-dev conda packages on always up-to-date channels whenever it's available.
At this time, only Intel GPUs are supported.
For Intel GPUs, two backends are available. You might want to install both of those, and test if one gives better performances.
🚧 TODO: write a guide on how to select the device and the backend in a python script.
-
Intel OpenCL for GPU: the intel OpenCL runtime can be installed following this link.
⚠⚠⚠ WARNING ⚠⚠⚠: for Ubuntu (confirmed for
focal
andjammy
) theapt
-based installation is broken, see IntelPython/dpctl#1010. Prefer the upstream.deb
packages provided at: https://github.com/intel/compute-runtime/releases.Click to expand a guide for the recommended installation steps for Ubuntu
⚠ Like whenever installing packages outside of official repositories, existing workarounds might make your system unstable and are not recommended outside of a containerized environment and/or for expert users.
To not alter the
apt
-based version tree too much and risk other compatibility issues, the recommended workaround consists in identifying the version that is officially supported by your OS (use packages.ubuntu.com) then find the corresponding build from the Intel release page on github and follow the instruction from the release page, e.g forjammy
:mkdir neo cd neo wget https://github.com/intel/compute-runtime/releases/download/22.14.22890/intel-gmmlib_22.0.2_amd64.deb wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.10840/intel-igc-core_1.0.10840_amd64.deb wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.10840/intel-igc-opencl_1.0.10840_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/22.14.22890/intel-opencl-icd_22.14.22890_amd64.deb dpkg -i *.deb # requires root permissions apt-get install -y ocl-icd-libopencl1 # requires root permissions cd ../ && rm -Rf neo
- oneAPI level zero loader: alternatively, or in addition, the oneAPI level zero
backend can be used. This backend is more experimental, and is sometimes preferred
over OpenCL. Source and
deb
archives are available here.
Non-root users might lack permission to access the GPU device to submit workloads. Add
the current user to the video
group and/or render
group:
sudo usermod -a -G video $USER
sudo usermod -a -G render $USER
You can setup a conda
environment, and install
dependencies (numba-dpex
and intel::dpcpp_linux-64
) distributed on the
conda-forge
, intel
and experimental dppy/label/dev
channels with:
export CONDA_DPEX_ENV_NAME=my-dpex-env
(where you can replace the name of the environment my-dpex-env
with a name of your
liking) followed by
conda create --yes --name $CONDA_DPEX_ENV_NAME \
--channel dppy/label/dev \
--channel conda-forge \
--channel intel \
numba-dpex=0.22.0dev0=py310h776878d_2
Note that different versions of sklearn_numba_dpex
can require to pin different
versions, builds or channels in this last command.
scikit-learn
must be installed from source using an experimental version available on
feature/engine-api
, a
development branch. Be careful to build with compatible python
and numpy
versions.
Click to expand a guide for building scikit-learn
We use a separate conda environment dedicated to building scikit-learn
. The following
sequence of commands will create the appropriate conda environment, build the
scikit-learn binary, then remove the environment:
conda activate $CONDA_DPEX_ENV_NAME
export DPEX_PYTHON_VERSION=$(python -c "import platform; print(platform.python_version())")
export DPEX_NUMPY_VERSION=$(python -c "import numpy; print(numpy.__version__)")
conda create --yes --name sklearn-dev \
--channel conda-forge \
"python==$DPEX_PYTHON_VERSION" \
"numpy==$DPEX_NUMPY_VERSION" \
scipy cython joblib threadpoolctl pytest compilers
conda activate sklearn-dev
git clone https://github.com/scikit-learn/scikit-learn -b "feature/engine-api" --depth 1
cd scikit-learn
git checkout 7d52073b15ee920c6f49208c777e7ce7663ff74b
python setup.py bdist_wheel
conda activate $CONDA_DPEX_ENV_NAME
cd dist/
pip install *.whl
unset DPEX_PYTHON_VERSION
unset DPEX_NUMPY_VERSION
conda env remove --name sklearn-dev --yes
cd ../../
conda deactivate
rm -Rf scikit-learn
Finally, activate the environment with the command:
conda activate my-dpex-env
Alternatively, a docker image is available and provides an up-to-date, one-command install environment. You can either build it from the Dockerfile:
cd docker
DOCKER_BUILDKIT=1 docker build . -t numba_dpex_dev
or pull the docker image from this publicly available repository:
docker pull jjerphan/numba_dpex_dev:latest
Run the container in interactive mode with your favorite docker flags, for example:
docker run --name my_container_name -it -v /my/host/volume/:/mounted/volume --device=/dev/dri jjerphan/numba_dpex_dev:latest
or alternatively, replace jjerphan/numba_dpex_dev:latest
by numba_dpex_dev
or
any tag you used when building the image locally from the provided Dockerfile
.
⚠ The flag --device=/dev/dri
is mandatory to enable the gpu within the container,
also the user starting the docker run
command must have access to the gpu, see
Give permissions to submit GPU workloads.
Unless using the flag --rm
when starting a container, you can restart it after it was
exited, with the command:
sudo docker start -a -i my_container_name
Once you have loaded into the container, follow those instructions to install the
feature/engine-api
branch of scikit-learn:
git clone https://github.com/scikit-learn/scikit-learn -b "feature/engine-api" --depth 1
cd scikit-learn
git checkout 7d52073b15ee920c6f49208c777e7ce7663ff74b
pip install -e .
cd ..
Once the environment you just installed with one of those two methods is activated, you can inspect the available hardware:
python -c "import dpctl; print(dpctl.get_devices())"
this should print a list of available devices, including cpu
and gpu
devices, once
for each available backends (opencl
, level_zero
,...).
FIXME: currently, non-editable mode installation does not work.
When loaded into your numba_dpex
+ scikit-learn
environment from previous steps,
run:
git clone https://github.com/soda-inria/sklearn-numba-dpex
cd sklearn-numba-dpex
pip install -e .
See the sklearn_numba_dpex/kmeans/tests
folder for example usage.
🚧 TODO: write some examples here instead.
To run the tests run the following from the root of the sklearn_numba_dpex
repository:
pytest sklearn_numba_dpex
To run the scikit-learn
tests with the sklearn_numba_dpex
engine you can run the
following:
SKLEARN_NUMBA_DPEX_TESTING_MODE=1 pytest --sklearn-engine-provider sklearn_numba_dpex --pyargs sklearn.cluster.tests.test_k_means
(change the --pyargs
option accordingly to select other test suites).
The --sklearn-engine-provider sklearn_numba_dpex
option offered by the sklearn pytest
plugin will automatically activate the sklearn_numba_dpex
engine for all tests.
Tests covering unsupported features (that trigger
sklearn.exceptions.FeatureNotCoveredByPluginError
) will be automatically marked as
xfailed.
Repeat the pip installation step exposed in step 3 with the following edit:
pip install -e .[benchmark]
(i.e adding the benchmark extra-require), followed by:
cd benckmark
python ./kmeans.py
to run a benchmark for different k-means implementations and print a short summary of the performance.
The command
python ./kmeans --help
will output more information about the available parameters.
In many machine learning applications, operations using single-precision (float32) floating point data require twice as less memory that double-precision (float64), are regarded as faster, accurate enough and more suitable for GPU compute. Besides, most GPUs used in machine learning projects are significantly faster with float32 than with double-precision (float64) floating point data.
To leverage the full potential of GPU execution, it's strongly advised to use a float32 data type.
By default, unless specified otherwise numpy array are created with type float64, so be especially careful to the type whenever the loader does not explicitly document the type nor expose a type option.
Transforming NumPy arrays from float64 to float32 is also possible using
numpy.ndarray.astype
,
although it is less recommended to prevent avoidable data copies. numpy.ndarray.astype
can be used as follows:
X = my_data_loader()
X_float32 = X.astype(float32)
my_gpu_compute(X_float32)