This is the code used for benchmarking different feature sets, including musif. Please, cite us:
Simonetta F., Llorens A., Serrano M., García-Portugués E., Torrente A., "Optimizing Feature Extraction for Symbolic Music", ISMIR 2023.
- Python 3.10 (e.g. via conda or pyenv)
pdm
, you barely have three options:pipx install pdm
(need pipx, recommended)pip install pdm
(environment specific)- see https://pdm.fming.dev/latest/ for other alternatives
pdm sync
to create the environment and install python packages- Alternatively to
pdm
, seecluster.md
for bare venv approach - MuseScore: download AppImage (4.0.1 has a bug, use 3.6.2, instead)
- Java: install using you OS package manager and check that the
java
command is available in the PATH - jSymbolic 2.2: download and unzip
- GCC and make: install using your OS package manager
humdrum
:git submodule update
cd humdrum-tools
make update
make
In symbolic_features/settings.py
set the paths to MuseScore and jSymbolic executables.
Download the following datasets and set the paths to the root of each one in symbolic_features/settings.py
- Josquin - La Rue
- ASAP
- Didone
- EWLD
- String quartets:
- Haydn
- Mozart
- Beethoven
- unzip the above three zips into one directory, e.g.:
quartets/haydn
,quartets/mozart
,quartets/beethoven
Fix invalid file names: pdm fix_names
. This will fix names containing ,
and ;
that cause errors in csv files.
Convert any file to MIDI: pdm convert2midi
. You will need to run Xvfb :99 & export DISPLAY=:99
if you are running without display (e.g. in a remote ssh session)
Reproduce experiments: ./extract_all.sh
Detailed commands:
jSymbolic
:pdm extract --jsymbolic --extension .mid
musif
:
pdm extract --musif --extension .mid
pdm extract --musif --extension .xml
pdm extract --musif --extension .krn
music21
:
pdm extract --music21 --extension .mid
pdm extract --music21 --extension .xml
pdm extract --music21 --extension .krn
Reproduce experiments: pdm validation
Detailed commands
pdm classification
: run all experiments with original featurespdm classification --use_first_10_pc
: run all experiments with first 10 Principal Components from each task (where a task is a combination of dataset, feature set, and extension)pdm plot
: plot the AutoML optimization score across timepdm classification --featureset='music21' --dataset='EWLD' --extension='mid' --automl_time=60
: run an experiment on a single task for 60 seconds