About crank

crank is a non-parallel voice conversion based on vector-quantized variational autoencoder with adversarial learning. This is a repository to describe converted audio samples generated by crank.

K. Kobayashi, W-C. Huang, Y-C. Wu, P.L. Tobing, T. Hayashi, T. Toda, 
"crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder", 
Proc. ICASSP, 2021. (accepted)

Voice Conversion Challenge 2018 dataset

Following audio samples are generated by crank (ver 0.3.0) and objective results described in the paper are calculated using these waveforms. You can download all converted samples from following URL.

Method

  • Baseline VQVAE
    • Three-stacked hierarchical VQVAE
  • CycleVQVAE
    • Baseline VQVAE with cyclic architecture
  • VQVAEGAN
    • Baseline VQVAE with GAN
  • CycleVQVAEGAN
    • Baseline VQVAE with cyclic architecture and GAN
  • CycleVQVAEGAN w/ STFTLoss
    • Baseline VQVAE with cyclic architecture and GAN with STFT loss

SM1-TM1 (Male-to-male)

  • source | target

  • Baseline VQVAE | CycleVQVAE

  • VQVAEGAN | CycleVQVAEGAN

  • CycleVQVAEGAN w/ STFTLoss

SF1-TF1 (Female-to-female)

  • source | target

  • Baseline VQVAE | CycleVQVAE

  • VQVAEGAN | CycleVQVAEGAN

  • CycleVQVAEGAN w/ STFTLoss

SM1-TF1 (Male-to-female)

  • source | target

  • Baseline VQVAE | CycleVQVAE

  • VQVAEGAN | CycleVQVAEGAN

  • CycleVQVAEGAN w/ STFTLoss

SF1-TM1 (Female-to-male)

  • Baseline VQVAE | CycleVQVAE

  • VQVAEGAN | CycleVQVAEGAN

  • CycleVQVAEGAN w/ STFTLoss