IVA4Cocktail

Speech density estimation for multichannel convolutive speech/music separation. I use independent vector analysis (IVA) as the separation framework. Please check the report for details.

Please use the archived old code to reproduce the results reported here. I rewrote the code to make it more organized and useful.

Unlike the popular end-to-end supervised speech separation methods, the target here is to learn a neural network density model for unsupervised separation. The resultant density model can be used for, e.g., online or batch separation, separation of different number of sources, separation of artificial or realistic mixtures, without the need to retrain any different specific supervised separation model.

On the Pytorch training code

artificial_mixture_generator.py: the actual mixing matrix is inv(a_FIR_system) * (another_FIR_system) since we change the mixing matrix constantly and natural gradient descent works on the combined separation-mixing matrix.

dnn_source_priors.py: simple circular and noncircular source models are defined. If one wants to recover the phase of each bin as well, noncircular model must be used. Recovering of phase (up to certain global rotation ambiguity) is nontrivial since this will deconvole/dereverberate the speech. This is achieved by forcing the reconstructed speech using the estimated phases to be coherent with the original source as well. Still, a light memoryless circular model seems to be good enough most of the time.

losses.py: except for the standard coherence loss, a symmetric Itakura–Saito distance loss can be used to recover the amplitude of speech as well (of course, up to a certain global scaling ambiguity). Pre-emphasizing the high frequencies can make the amplitude modeling easier (set the LPC in artificial_mixture_generator.py properly).

short_time_Fourier_transform.py: this should work with Pytorch's old (torch version 1.7) and new (version 1.8) FFT APIs. I still use the old view_as_real format for complex numbers.

preconditioned_stochastic_gradient_descent.py: this is a second order optimizer. I use it mainly to save the hyperparameter tuning efforts.

Lastly, demo.m is a Matlab/Octave file showing the usage of a trained circular density model with the default settings in config.py. There also are some pre-designed window functions by this method.

Some sample separation results for subjective comparison (using the density model in the archived old code)

These are some typical mixtures with simulated RIRs and separation results of 10 sources with length 10 second. The neural network density models always have better subjective separation performance, even for the first set of sample results, where the signal to interference ratio (SIR) of multivariate Laplace model is 0.3 dB higher than that of neural network one. One reason is that SIR is not very sensitive to errors like the low pass and high pass bands permutations since most speech energy locates in low frequency band, while human ears are picky.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
README.md		README.md
artificial_mixture_generator.py		artificial_mixture_generator.py
circ_src_prior.mat		circ_src_prior.mat
config.py		config.py
demo.m		demo.m
dnn_source_priors.py		dnn_source_priors.py
losses.py		losses.py
main.py		main.py
preconditioned_stochastic_gradient_descent.py		preconditioned_stochastic_gradient_descent.py
short_time_Fourier_transform.py		short_time_Fourier_transform.py
win_1024_320.mat		win_1024_320.mat
win_2048_640.mat		win_2048_640.mat
win_256_80.mat		win_256_80.mat
win_4096_1280.mat		win_4096_1280.mat
win_512_160.mat		win_512_160.mat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IVA4Cocktail

On the Pytorch training code

Some sample separation results for subjective comparison (using the density model in the archived old code)

About

Releases 1

Packages

Languages

lixilinx/IVA4Cocktail

Folders and files

Latest commit

History

Repository files navigation

IVA4Cocktail

On the Pytorch training code

Some sample separation results for subjective comparison (using the density model in the archived old code)

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages