Voice Conversion Challenge 2018

Compare different voice conversion systems and approaches using the same voice data!

Tasks of the 2nd Challenge

The objective is speaker conversion, which is a well-known basic problem in voice conversion. We have prepared two tasks:

Hub task (main task): parallel training
- We will provide voices of 4 source and 4 target speakers (consisting of both female and male speakers) from fixed corpora as training data. Each speaker utters the same sentences set consisting of around 80 sentences.
- Using these parallel data sets, voice conversion systems for all speaker-pair combinations (16 speaker-pairs in total) will be developed by each participant.
Spoke task (optional task): nonparallel training
- We will also provide voices of other 4 source speakers (consisting of both female and male speakers) from fixed corpora as training data. Each speaker utters another sentences set consisting of around 80 sentences. The target speakers are the same as in the hub task. Therefore, the sentence set of the source speakers is different from that of the target speakers.
- Using these nonparallel data sets, voice conversion systems for all speaker-pair combinations (16 speaker-pairs in total) will be developed by each participant.

Other voices of the same source speakers will be provided later as test data consistsing of around 50 sentences for each speaker. Each participant will generate converted voices from them using the developed 16 conversion systems.

The resulting 16 converted voice sets will be evaluated in terms of perceived naturalness and similarity through listening tests.

We focus on 22.05 kHz speech and signal-to-signal conversion strategies. No transcriptions will be provided for the test set, and the use of manual annotations is NOT allowed (check the rules section for more detailed information). Participants are free of using additional data (for training purposes).

<< Important changes compared to the 1st challenge >>

There are some important changes regarding rules and listening tests compared to the 2016 challenge:

In the 2018 challenge you are allowed to mix and combine different source speaker's data to train speaker-independent models.
In the 2018 challenge you may use orthographic transcriptions of the released training data to train your voice conversion systems. Note that we will not provide orthographic transcriptions of speech data in the evaluation set.
In the 2018 challenge you may perform manual annotations of the released training data. However, we will not allow you to perform manual annotations of speech data in the evaluation set.
In the 2018 challenge listening tests will use natural speech at 22.05kHz sampling frequency as the reference signal.

Timeline

The tentative schedule is as follows:

October 1st: release of training data
December 1st: release of evaluation data
December 8th: deadline to submit the converted audio.
January 26th: notification of results

How to Participate?

There is no fee for registration. Please register your team at the following page by September 29th if you want to participate in the challenge.

Registration page (closed)

Guidelines

Participants need to follow strictly the Challenge rules. Please read carefully the following page:

Voice Conversion Challenge 2018 Rules

Summary

Overview and results
- J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling, "The voice conversion challenge 2018: promoting development of parallel and nonparallel methods," Proc. Odyssey 2018, pp. 195-202, 2018.
  [Paper]
- T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling, "A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment," Proc. Odyssey 2018, pp. 187-194, 2018.
  [Paper]
Freely available materials
- J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling, "The Voice Conversion Challenge 2018: database and results," The Centre for Speech Technology Research, The University of Edinburgh, UK, 2018.
  [https://doi.org/10.7488/ds/2337]

Acknowledgement

This work was supported in part by

iFLYTEK (http://www.iflytek.com/en/)
JSPS KAKENHI Grant Number JP17H06101
MEXT KAKENHI Grant Numbers (15H01686, 16H06302, 17H04687)

Organizers

Junichi Yamagishi & Jaime Lorenzo-Trueba (National Institute of Informatics)
Tomoki Toda (Nagoya University)
Daisuke Saito (Tokyo University)
Fernando Villavicencio (ObEN)
Tomi Kinnunen (University of Eastern Finland)
Zhenhua Ling (University of Science and Technology of China)

Contact information: vcc2018__at__vc-challenge.org