Code for the paper:
María Alfaro-Contreras, Jose J. Valero-Mas, Jose M. Iñesta, Jorge Calvo-Zaragoza
Late multimodal fusion for image and audio music transcription
Expert Systems with Applications, 216, 119491, 2023
Dataset used: Camera-PrIMuS. Available here. The partitions used can be found in the 5-crossval.tgz file.
Citation
@article{alfaro2023late,
author = {Alfaro-Contreras, Mar{\'i}a and Valero-Mas, Jose J. and I{\~n}esta, Jose M. and Calvo-Zaragoza, Jorge},
title = {{Late multimodal fusion for image and audio music transcription}},
journal = {Expert Systems with Applications},
volume = {216},
pages = {119491},
year = {2023},
issn = {0957-4174}
}
Requirements
tensorflow-gpu==2.3.1
pandas==1.3.0
numpy==1.18.5
opencv-python==4.5.3.56
swalign==0.3.6