Skip to content

Latest commit

 

History

History
340 lines (248 loc) · 12.4 KB

README.md

File metadata and controls

340 lines (248 loc) · 12.4 KB

sklearn-numba-dpex

Experimental plugin for scikit-learn to be able to run (some) estimators on Intel GPUs via numba-dpex. Support for other GPU constructors is also on the roadmap and depends on progress of interoperability features of the numba-dpex stack.

This package requires working with the following branch of scikit-learn:

A step-by-step guide is provided in this README for installing numba-dpex, along with the feature/engine-api branch of scikit-learn and this plugin from source.

🚧 TODO: at the moment it is unusually complicated to install this plugin and its dependencies. Once feature/engine-api is merge and released in scikit-learn and we aim to make it possible to install sklearn-numba-dpex and all its dependencies with a one liner conda install command or pip install or both.

List of Included Engines

  • sklearn.cluster.KMeans for the standard LLoyd's algorithm on dense data arrays, including kmeans++ support.

Getting started:

Step 1: Installing a numba_dpex environment

Getting started requires a working environment for using numba_dpex. Currently a conda install or a docker image are available.

⚠⚠⚠ WARNING ⚠⚠⚠: latest numba_dpex releases might have stability issues. If you run into segfaults or wrong outputs, try disabling JIT compilation optimizations by setting the environment variable NUMBA_DPEX_OPT=0. If you discover minimal reproducers for such stability issues, please report at numba_dpex issue tracker.

Using a conda installation

Conda does not currently support installation of the low-level runtime libraries for GPUs, so the first part of the installation guide consists in installing those libraries on the host system.

The second part consists in running conda commands that create the environment with all the required packages and configuration. Note that the installation logic is a bit complicated since it mixes packages from several conda channels conda-forge, dppy/label/dev, and intel, some of which being experimental. Neither the builds nor the channels are maintained by the sklearn_numba_dpex team and their level of stability is unknown.

🚧 TODO: update the instructions to install everything from non-dev conda packages on always up-to-date channels whenever it's available.

Install low-level runtime libraries for your GPU (1/2)

At this time, only Intel GPUs are supported.

Intel GPU runtime libraries

For Intel GPUs, two backends are available. You might want to install both of those, and test if one gives better performances.

🚧 TODO: write a guide on how to select the device and the backend in a python script.

  • Intel OpenCL for GPU: the intel OpenCL runtime can be installed following this link.

    ⚠⚠⚠ WARNING ⚠⚠⚠: for Ubuntu (confirmed for focal and jammy) the apt-based installation is broken, see IntelPython/dpctl#1010. Prefer the upstream .deb packages provided at: https://github.com/intel/compute-runtime/releases.

    Click to expand a guide for the recommended installation steps for Ubuntu

    ⚠ Like whenever installing packages outside of official repositories, existing workarounds might make your system unstable and are not recommended outside of a containerized environment and/or for expert users.

    To not alter the apt-based version tree too much and risk other compatibility issues, the recommended workaround consists in identifying the version that is officially supported by your OS (use packages.ubuntu.com) then find the corresponding build from the Intel release page on github and follow the instruction from the release page, e.g for jammy:

    mkdir neo
    cd neo
    wget https://github.com/intel/compute-runtime/releases/download/22.14.22890/intel-gmmlib_22.0.2_amd64.deb
    wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.10840/intel-igc-core_1.0.10840_amd64.deb
    wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.10840/intel-igc-opencl_1.0.10840_amd64.deb
    wget https://github.com/intel/compute-runtime/releases/download/22.14.22890/intel-opencl-icd_22.14.22890_amd64.deb
    dpkg -i *.deb  # requires root permissions
    apt-get install -y ocl-icd-libopencl1  # requires root permissions
    cd ../ && rm -Rf neo
  • oneAPI level zero loader: alternatively, or in addition, the oneAPI level zero backend can be used. This backend is more experimental, and is sometimes preferred over OpenCL. Source and deb archives are available here.
Give permissions to submit GPU workloads

Non-root users might lack permission to access the GPU device to submit workloads. Add the current user to the video group and/or render group:

sudo usermod -a -G video $USER
sudo usermod -a -G render $USER
Setup a conda environment for numba-dpex (2/2)

You can setup a conda environment, and install dependencies (numba-dpex and intel::dpcpp_linux-64) distributed on the conda-forge, intel and experimental dppy/label/dev channels with:

export CONDA_DPEX_ENV_NAME=my-dpex-env

(where you can replace the name of the environment my-dpex-env with a name of your liking) followed by

conda create --yes --name $CONDA_DPEX_ENV_NAME \
             --channel dppy/label/dev \
             --channel conda-forge \
             --channel intel \
             numba-dpex=0.22.0dev0=py310h776878d_2

Note that different versions of sklearn_numba_dpex can require to pin different versions, builds or channels in this last command.

scikit-learn must be installed from source using an experimental version available on feature/engine-api, a development branch. Be careful to build with compatible python and numpy versions.

Click to expand a guide for building scikit-learn

We use a separate conda environment dedicated to building scikit-learn. The following sequence of commands will create the appropriate conda environment, build the scikit-learn binary, then remove the environment:

conda activate $CONDA_DPEX_ENV_NAME
export DPEX_PYTHON_VERSION=$(python -c "import platform; print(platform.python_version())")
export DPEX_NUMPY_VERSION=$(python -c "import numpy; print(numpy.__version__)")
conda create --yes --name sklearn-dev \
                   --channel conda-forge \
                   "python==$DPEX_PYTHON_VERSION" \
                   "numpy==$DPEX_NUMPY_VERSION" \
                   scipy cython joblib threadpoolctl pytest compilers
conda activate sklearn-dev
git clone https://github.com/scikit-learn/scikit-learn -b "feature/engine-api" --depth 1
cd scikit-learn
git checkout 7d52073b15ee920c6f49208c777e7ce7663ff74b
python setup.py bdist_wheel
conda activate $CONDA_DPEX_ENV_NAME
cd dist/
pip install *.whl
unset DPEX_PYTHON_VERSION
unset DPEX_NUMPY_VERSION
conda env remove --name sklearn-dev --yes
cd ../../
conda deactivate
rm -Rf scikit-learn

Finally, activate the environment with the command:

conda activate my-dpex-env

Using the docker image

Alternatively, a docker image is available and provides an up-to-date, one-command install environment. You can either build it from the Dockerfile:

cd docker
DOCKER_BUILDKIT=1 docker build . -t numba_dpex_dev

or pull the docker image from this publicly available repository:

docker pull jjerphan/numba_dpex_dev:latest

Run the container in interactive mode with your favorite docker flags, for example:

docker run --name my_container_name -it -v /my/host/volume/:/mounted/volume --device=/dev/dri jjerphan/numba_dpex_dev:latest

or alternatively, replace jjerphan/numba_dpex_dev:latest by numba_dpex_dev or any tag you used when building the image locally from the provided Dockerfile.

⚠ The flag --device=/dev/dri is mandatory to enable the gpu within the container, also the user starting the docker run command must have access to the gpu, see Give permissions to submit GPU workloads.

Unless using the flag --rm when starting a container, you can restart it after it was exited, with the command:

sudo docker start -a -i my_container_name

Once you have loaded into the container, follow those instructions to install the feature/engine-api branch of scikit-learn:

git clone https://github.com/scikit-learn/scikit-learn -b "feature/engine-api" --depth 1
cd scikit-learn
git checkout 7d52073b15ee920c6f49208c777e7ce7663ff74b
pip install -e .
cd ..

Step 2: Check the installation of the environment was successfull

Once the environment you just installed with one of those two methods is activated, you can inspect the available hardware:

python -c "import dpctl; print(dpctl.get_devices())"

this should print a list of available devices, including cpu and gpu devices, once for each available backends (opencl, level_zero,...).

Step 3: install this plugin

FIXME: currently, non-editable mode installation does not work.

When loaded into your numba_dpex + scikit-learn environment from previous steps, run:

git clone https://github.com/soda-inria/sklearn-numba-dpex
cd sklearn-numba-dpex
pip install -e .

Intended usage

See the sklearn_numba_dpex/kmeans/tests folder for example usage.

🚧 TODO: write some examples here instead.

Running the tests

To run the tests run the following from the root of the sklearn_numba_dpex repository:

pytest sklearn_numba_dpex

To run the scikit-learn tests with the sklearn_numba_dpex engine you can run the following:

SKLEARN_NUMBA_DPEX_TESTING_MODE=1 pytest --sklearn-engine-provider sklearn_numba_dpex --pyargs sklearn.cluster.tests.test_k_means

(change the --pyargs option accordingly to select other test suites).

The --sklearn-engine-provider sklearn_numba_dpex option offered by the sklearn pytest plugin will automatically activate the sklearn_numba_dpex engine for all tests.

Tests covering unsupported features (that trigger sklearn.exceptions.FeatureNotCoveredByPluginError) will be automatically marked as xfailed.

Running the benchmarks

Repeat the pip installation step exposed in step 3 with the following edit:

pip install -e .[benchmark]

(i.e adding the benchmark extra-require), followed by:

cd benckmark
python ./kmeans.py

to run a benchmark for different k-means implementations and print a short summary of the performance.

The command

python ./kmeans --help

will output more information about the available parameters.

Notes about the preferred floating point precision (float32)

In many machine learning applications, operations using single-precision (float32) floating point data require twice as less memory that double-precision (float64), are regarded as faster, accurate enough and more suitable for GPU compute. Besides, most GPUs used in machine learning projects are significantly faster with float32 than with double-precision (float64) floating point data.

To leverage the full potential of GPU execution, it's strongly advised to use a float32 data type.

By default, unless specified otherwise numpy array are created with type float64, so be especially careful to the type whenever the loader does not explicitly document the type nor expose a type option.

Transforming NumPy arrays from float64 to float32 is also possible using numpy.ndarray.astype, although it is less recommended to prevent avoidable data copies. numpy.ndarray.astype can be used as follows:

X = my_data_loader()
X_float32 = X.astype(float32)
my_gpu_compute(X_float32)