Singing Voice Conversion Challenge 2023 Rules
- Please register from the following page if you want to participate in the challenge.
- There is no registration fee.
Data Provided in the Challenge
- For the 1st task, the organizers will provide a dataset consisting of 2 target singers' singing voices. Around 130 to 170 sentences are sung by each singer. No training data of the source will be provided.
- For the 2nd task, the organizers will provide another dataset consisting of other 2 target speakers' speech. Around 130 to 170 sentences are spoken by each target speaker. No training data of the source will be provided.
- Not only waveform files but also manual transcription corresponding to these utterances are included in both data sets.
- After registration, a link for downloading the datasets will be issued.
- A text file (README) describing more detailed information will also be included in the dataset. Please read it carefully.
Tasks of the Challenge
- The singer conversion tasks of this challenge are:
- 1st task: Any-to-one, in-domain singing voice conversion
- 2nd task: Any-to-one, cross-domain singing voice conversion
Participants can participate both or either tasks.
- Training step
- For both tasks 1 and 2, each participant needs to develop singing voice conversion systems for all target singers using the provided dataset
- Conversion step (for both tasks 1 and 2)
- Another voice data set of 2 source singers will be provided later, which consists of 24 singing utterances for each source singer, for a total of 48 utterances.
- Each participant needs to convert these source singers' voice samples into individual target speaker's voices while keeping contents unchanged with the developed conversion systems.
- In total, 96 converted voice samples (24 utterances times 4 speaker pairs) will be generated for the 1st task, and 96 converted voice samples (24 utterances times 4 speaker pairs) will be generated for the 2nd task.
- These converted voice samples will be submitted to the organizers, and then they will be evaluated in listening tests in terms of naturalness and speaker similarity. They will be also evaluated with some objective evaluation measures
- Instructions
- No manual edition or modification is allowed in the conversion step. Participants can manually optimize individual conversion systems in the training step, but they cannot do so in the conversion step (e.g., even manual tuning of the system parameters is NOT allowed in the conversion step).
- The use of manual annotations (such as phoneme information, phoneme boundary, linguistic information, etc.) on the evaluation data sets is NOT allowed. Automatic speech recognition systems may be used to generate automatic transcriptions. On the other hand, manual annotations CAN be used for the training data sets.
- Any acoustic features including suprasegmental and duration features may be transformed.
- All speakers' voices in the data sets provided by the organizers can also be used to develop a conversion system for a certain speaker. However, the use of the original dataset (NHSS) is NOT allowed.
- Participants are also free to discard some utterances from the data set in the training step.
- It is not permissible for a single participant to submit multiple entries in each task because the listening test will become unmanageable. Participants involved in joint projects or consortia who wish to submit multiple systems, please ask the organisers in advance for confirmation.
- Participants need to complete a form giving the general technical specification of their developed conversion system to facilitate easy cross-system comparisons (e.g. is it an end-to-end system? what acoustic features are being used? etc).
- Special rules on additional datasets
In the past voice conversion challenges, we allowed participants to use any additional data for training purposes. However, using private, in-houst datasets makes reproducing the results difficult for other researchers. Starting this year, the organizers aim to encourage reproducible research. With that in mind, the following new rules will be added:
- Additional datasets used for training need to be publicly available, as listed below.
- If the dataset is not publicly avilable, then the model checkpoint needs to be open-sourced and made publically available.
Allowed additional datasets include, but not limited by the following list:
Note that this is a constantly updated list. If you plan to use any public available dataset that is not in the above-mentioned list, please send a request email The organizers will review and update this list. The deadline of requests is Apr. 21st, 2023, the date of the release of the evaluation data.
Retention of Submitted Voice Samples
- The submitted voice samples for evaluation will be retained by the Singing Voice Conversion Challenge 2023 organizers for future use.
- SVCC 2023 is in collaboration with the VoiceMOS challenge 2023, a challenge for synthesis speech quality assessment. The submitted voice samples will be provided for the challenge usage, including training and evaluation.
- When participants submit the converted voices, they will be asked to give the organizers permission to publically distribute the submitted voices and the corresponding listening test results in an anonymized form. We really appreciate it if all participants approve this consent agreement!
Paper Submissions
- We would like to ask each perticipant to submit a paper describing their entry. We are trying to make an opportunity for participants to present their papers. We will have more information later.
[back to Singing Voice Conversion Challenge page]
Contact information: