Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement beatnet #1478

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Implement beatnet #1478

wants to merge 6 commits into from

Conversation

kobakos
Copy link
Contributor

@kobakos kobakos commented May 22, 2024

@kobakos kobakos requested a review from kyakuno May 22, 2024 09:01
@kyakuno
Copy link
Collaborator

kyakuno commented Aug 9, 2024

@kobakos beatnetのonnxをいただくことは可能でしょうか?

Fix parser explanation and typo
@kyakuno
Copy link
Collaborator

kyakuno commented Aug 9, 2024

モデルアップロード済み。
https://storage.googleapis.com/ailia-models/beatnet/beatnet_1.onnx

@kyakuno
Copy link
Collaborator

kyakuno commented Aug 10, 2024

madmomで下記のエラーになる。

Traceback (most recent call last):
  File "/Users/kyakuno/Desktop/ailia/ailia-models-ax/audio_processing/beatnet/beatnet.py", line 4, in <module>
    from madmom.features import DBNDownBeatTrackingProcessor
  File "/usr/local/lib/python3.11/site-packages/madmom/__init__.py", line 24, in <module>
    from . import audio, evaluation, features, io, ml, models, processors, utils
  File "/usr/local/lib/python3.11/site-packages/madmom/audio/__init__.py", line 27, in <module>
    from . import comb_filters, filters, signal, spectrogram, stft
  File "madmom/audio/comb_filters.pyx", line 15, in init madmom.audio.comb_filters
  File "/usr/local/lib/python3.11/site-packages/madmom/processors.py", line 23, in <module>
    from collections import MutableSequence
ImportError: cannot import name 'MutableSequence' from 'collections' (/usr/local/Homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/collections/__init__.py)

@kyakuno
Copy link
Collaborator

kyakuno commented Aug 10, 2024

これが原因みたい。
CPJKU/madmom#535

@kyakuno
Copy link
Collaborator

kyakuno commented Aug 10, 2024

@kobakos madmomをlibrosaのstftなどに置き換え可能でしょうか?

@kyakuno
Copy link
Collaborator

kyakuno commented Aug 10, 2024

madmom

from madmom.audio.signal import SignalProcessor, FramedSignalProcessor
from madmom.audio.stft import ShortTimeFourierTransformProcessor
from madmom.audio.spectrogram import (
    FilteredSpectrogramProcessor, LogarithmicSpectrogramProcessor,
    SpectrogramDifferenceProcessor)
from madmom.processors import ParallelProcessor, SequentialProcessor


# feature extractor that extracts magnitude spectrogram and its differences  

class LOG_SPECT(FeatureModule):
    def __init__(self, num_channels=1, sample_rate=22050, win_length=2048, hop_size=512, n_bands=[12], mode='online'):
        sig = SignalProcessor(num_channels=num_channels, win_length=win_length, sample_rate=sample_rate)
        self.sample_rate = sample_rate
        self.hop_length = hop_size
        self.num_channels = num_channels
        multi = ParallelProcessor([])
        frame_sizes = [win_length]  
        num_bands = n_bands  
        for frame_size, num_bands in zip(frame_sizes, num_bands):
            if mode == 'online' or mode == 'offline':
                frames = FramedSignalProcessor(frame_size=frame_size, hop_size=hop_size) 
            else:   # for real-time and streaming modes 
                frames = FramedSignalProcessor(frame_size=frame_size, hop_size=hop_size, num_frames=4) 
            stft = ShortTimeFourierTransformProcessor()  # caching FFT window
            filt = FilteredSpectrogramProcessor(
                num_bands=num_bands, fmin=30, fmax=17000, norm_filters=True)
            spec = LogarithmicSpectrogramProcessor(mul=1, add=1)
            diff = SpectrogramDifferenceProcessor(
                diff_ratio=0.5, positive_diffs=True, stack_diffs=np.hstack)
            # process each frame size with spec and diff sequentially
            multi.append(SequentialProcessor((frames, stft, filt, spec, diff)))
        # stack the features and process everything sequentially
        self.pipe = SequentialProcessor((sig, multi, np.hstack))

    def process_audio(self, audio):
        feats = self.pipe(audio)
        return feats.T

chatgptでlibrosaにしてもらったもの(動作未確認)

import numpy as np
import librosa
import librosa.display

class LOG_SPECT:
    def __init__(self, sample_rate=22050, win_length=2048, hop_length=512, n_mels=128):
        self.sample_rate = sample_rate
        self.win_length = win_length
        self.hop_length = hop_length
        self.n_mels = n_mels

    def process_audio(self, audio_path):
        y, sr = librosa.load(audio_path, sr=self.sample_rate)

        # Calculate the short-time Fourier transform (STFT)
        stft = librosa.stft(y, n_fft=self.win_length, hop_length=self.hop_length)
        
        # Get the magnitude spectrogram
        mag_spec = np.abs(stft)

        # Convert to mel scale
        mel_spec = librosa.feature.melspectrogram(S=mag_spec, sr=sr, n_mels=self.n_mels)

        # Convert to log scale
        log_mel_spec = librosa.power_to_db(mel_spec)

        # Compute the first order difference (delta)
        delta = librosa.feature.delta(log_mel_spec)

        # Combine the log mel spectrogram and its delta
        log_mel_spec_with_delta = np.vstack([log_mel_spec, delta])

        return log_mel_spec_with_delta.T

# 使用例
# feature_extractor = LOG_SPECT()
# features = feature_extractor.process_audio('path/to/audio/file.wav')

@kobakos
Copy link
Contributor Author

kobakos commented Aug 22, 2024

エラーのもとになっているfeatures.pyはLOG_SPECTだけじゃなくてDBNDownBeatTrackingProcessorやparticle_filtering_cascadeを使うときにも必要なので、LOG_SPECTの修正だけではDBNと粒子フィルタどちらの推論モードでも依然エラーは出てしまいます。必要なコードだけmadmomからとってきて同じディレクトリに入れることができないかとも思いましたがCythonが使われているのでこれも簡単ではなさそう。
GitHubにある実装ではインポートの問題も解消されているのですが、pipのやつは古いままになっているのでrequirements.txtによる対応は難しそうです。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants