Voice Conversion Challenge 2018 Rules

Registration

Please register from the following page if you want to participate in the challenge.
- Registration page
There is no registration fee.

Voice Data Provided in the Challenge

For the hub task, the organizers will provide a data set consisting of 4 source and 4 target speakers' voices. The same 81 sentences are uttered by each speaker, that is, a total 648 utterances are included in the data set.
For the spoke task, the organizers will provide another data set consisting of other 4 source speakers' voices. 81 sentences uttered by each source speaker, for a total of 324 utterances in the data set. For the target speakers, the same 81 utterances as in the hub task will be used.
Not only waveform files but also manual transcription corresponding to these utterances are included in both data sets.
After registration, a password for downloading the data sets will be issued.
A text file (README) describing more detailed information will also be included in the data set. Please read it carefully.

Tasks of the Challenge

The speaker conversion tasks of this challenge are:
- Hub task (main task): parallel training
- Spoke task (optional task): non-parallel training
Note that all participants need to participate in the hub task in order to submit a system for the spoke task.
Training step for the hub task
- For the hub task, each participant needs to develop voice conversion systems for all source and target speaker pairs using up to 81 parallel utterance pairs for each speaker pair as training data.
- In total, 16 conversion systems (i.e., 4 sources by 4 targets) will be developed.
Training step for the spoke task
- For the spoke task, each participant needs to develop voice conversion systems for all source and target speaker pairs using up to 81 non-parallel utterances for each speaker pair as training data.
- In total, 16 conversion systems (i.e., 4 sources by 4 targets) will be developed.
Conversion step (for both hub and spoke tasks)
- Another voice data set of the same 8 source speakers (i.e., 4 source speakers for the hub task and the other 4 source speakers for the spoke task) will be provided later, which consists of 54 utterances for each source speaker, for a total of 432 utterances.
- Each participant needs to convert these source speakers' voice samples into individual target speaker's voices with the 16 developed conversion systems.
- In total, 864 converted voice samples (54 utterances times 16 speaker pairs) will be generated for each task.
- These converted voice samples will be submitted to the organizers, and then they will be evaluated in listening tests in terms of naturalness and speaker similarity.
Instructions
- No manual edition or modification is allowed in the conversion step. Participants can manually optimize individual conversion systems in the training step, but they cannot do so in the conversion step (e.g., even manual tuning of the system parameters is NOT allowed in the conversion step).
- The use of manual annotations (such as phoneme information, phoneme boundary, linguistic information, etc.) on the evaluation data sets is NOT allowed. Automatic speech recognition systems may be used to generate automatic transcriptions. On the other hand, manual annotations CAN be used for the training data sets.
- Any acoustic features including suprasegmental and duration features may be transformed.
- Participants are free to use additional data for training purposes. All speakers' voices in the data sets provided by the organizers can also be used to develop a conversion system for a certain speaker pair. However, the use of the original DAPS dataset and the dataset of Voice Conversion Challenge 2016 is NOT allowed.
- Participants are also free to discard some utterances from the data set in the training step.
- It is not permissible for a single participant to submit multiple entries in each task because the listening test will become unmanageable. Participants involved in joint projects or consortia who wish to submit multiple systems, please ask the organisers in advance for confirmation.
- Participants need to complete a form giving the general technical specification of their developed conversion system to facilitate easy cross-system comparisons (e.g. is it a GMM-based system? does it convert prosodic features? etc).
- If you have any doubt about how to apply these rules, please contact the organizers (vcc2018__at__vc-challenge.org) immediately.

Expert Listeners for Listening Tests

Each participant needs to recruit at least several volunteer listeners as expert listeners for each of the evaluation tests (on naturalness and speaker similarity). Native speakers are preferable but not necessary.
The organisers would also appreciate assistance in advertising the Challenge as widely as possible (e.g., to your students or colleagues).

Retention of Submitted Voice Samples

Any voice samples that you submit for evaluation will be retained by the Voice Conversion Challenge 2018 organizers for future use.
When participants submit the converted voices, they will be asked to give the organizers permission to publically distribute the submitted voices and the corresponding listening test results in an anonymized form. We really appreciate if all participants approve this consent agreement!

Paper Submissions

We would like to ask each perticipant to submit a paper describing their entry. We are trying to make an opportunity for participants to present their papers. We will have more information later.

[back to Voice Conversion Challenge page]
Contact information: vcc2018__at__vc-challenge.org