About crank
crank is a non-parallel voice conversion based on vector-quantized variational autoencoder with adversarial learning. This is a repository to describe converted audio samples generated by crank.
K. Kobayashi, W-C. Huang, Y-C. Wu, P.L. Tobing, T. Hayashi, T. Toda,
"crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder",
Proc. ICASSP, 2021. (accepted)
Voice Conversion Challenge 2018 dataset
Following audio samples are generated by crank (ver 0.3.0) and objective results described in the paper are calculated using these waveforms. You can download all converted samples from following URL.
Method
- Baseline VQVAE
- Three-stacked hierarchical VQVAE
- CycleVQVAE
- Baseline VQVAE with cyclic architecture
- VQVAEGAN
- Baseline VQVAE with GAN
- CycleVQVAEGAN
- Baseline VQVAE with cyclic architecture and GAN
- CycleVQVAEGAN w/ STFTLoss
- Baseline VQVAE with cyclic architecture and GAN with STFT loss
SM1-TM1 (Male-to-male)
- source | target
- Baseline VQVAE | CycleVQVAE
- VQVAEGAN | CycleVQVAEGAN
- CycleVQVAEGAN w/ STFTLoss
SF1-TF1 (Female-to-female)
- source | target
- Baseline VQVAE | CycleVQVAE
- VQVAEGAN | CycleVQVAEGAN
- CycleVQVAEGAN w/ STFTLoss
SM1-TF1 (Male-to-female)
- source | target
- Baseline VQVAE | CycleVQVAE
- VQVAEGAN | CycleVQVAEGAN
- CycleVQVAEGAN w/ STFTLoss
SF1-TM1 (Female-to-male)
- Baseline VQVAE | CycleVQVAE
- VQVAEGAN | CycleVQVAEGAN
- CycleVQVAEGAN w/ STFTLoss