svcc logo

Singing Voice Conversion Challenge 2023

Thank you for participating in the first Singing Voice Conversion Challenge (SVCC)!


The challenge has ended.

Voice conversion (VC) refers to the digital cloning of a person's voice; it can be used to modify audio waveform so that it appear as if spoken by someone else (target) than the original speaker (source). The voice conversion challenge (VCC) series aims to advance and compare different methods to approach the core VC technology using a common dataset, metrics and baseline systems provided by the organizers. With the rapid progress in the various essential modules in a VC system (including acoustic modeling, waveform synthesis, etc.), in the latest VCC, the top system showed an impressive performance, with its generated speech samples very close to human voice in terms of naturalness and similarity. We feel it is time to move our focus from fundamental technologies to more sophisticated applications.

Therefore, we are pleased to announce the first singing voice conversion challenge (SVCC). Singing voice conversion (SVC), extending the definition of normal VC, aims at converting the singing voice of a source singer to that of a target singer without changing the contents. The main applications of SVC lie in entertainment: new tools for virtual youtubers, singing voice beutifying in karaokes, or even singing-aid for the disabled. SVC is considered more challenging than VC, as singing voice is generally harder to model than speech, and data collection is more difficult. Moreover, during conversion, while the music score is considered part of the contents that must not be changed, certain singing styles such as viberato can be considered to be singer-dependent. Each of these prosody-related factors need to be modeled properly. From the community point of view, SVC is the intersection of speech processing and music process. We hope to attract attention from researcher in both communities to facilitate interdisciplinary research.

The previous VCCs can be accessed below:


Tasks of this Challenge

The objective is singer conversion. We plan to prepare two tasks:

We focus on 24 kHz singing voice and signal-to-signal conversion strategies. No transcriptions will be provided for the test set, and the use of manual annotations is NOT allowed. Please note that for this challenge, to facilitate reproducible research, any additional data used for training needs to be publically available. Please only use datasets described in a curated list maintained by the organizers.

Please check the rules section for more detailed information.

Timeline

The tentative schedule is as follows:

Baseline Systems

We provide baseline systems. Participants that are new to the singing voice conversion field are welcomed to utilize the open-sourced starter kit for this challenge. We have prepared a few sets of the converted samples generated using these baselines to help participants develop their systems.


Evaluation

Following previous VCCs, the main evaluation campaign will be a large-scale subjective evaluation conducted by recruiting human listeners to assess the quality of all the submitted systems. We will be evaluating the naturalness and similarity of the converted samples.


Challenge special session at ASRU 2023

The SVCC2023 is a challenge special session at ASRU 2023. Please attend ASRU 2023 and come to our poster to listen to the challenge smuuary. PArticipating teams whose paper got accepted will also present their work there.


Organizers


Contact information: svcc2023__at__vc-challenge.org