crank audio samples

About crank

crank is a non-parallel voice conversion based on vector-quantized variational autoencoder with adversarial learning. This is a repository to describe converted audio samples generated by crank.

K. Kobayashi, W-C. Huang, Y-C. Wu, P.L. Tobing, T. Hayashi, T. Toda, 
"crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder", 
Proc. ICASSP, 2021. (accepted)

Voice Conversion Challenge 2018 dataset

Following audio samples are generated by crank (ver 0.3.0) and objective results described in the paper are calculated using these waveforms. You can download all converted samples from following URL.

Method

Baseline VQVAE
- Three-stacked hierarchical VQVAE
CycleVQVAE
- Baseline VQVAE with cyclic architecture
VQVAEGAN
- Baseline VQVAE with GAN
CycleVQVAEGAN
- Baseline VQVAE with cyclic architecture and GAN
CycleVQVAEGAN w/ STFTLoss
- Baseline VQVAE with cyclic architecture and GAN with STFT loss