The Singing Voice Conversion Challenge 2025

Introduction of the Voice Conversion Challenge

Voice conversion (VC) refers to the digital cloning of a person's voice; it can be used to modify audio waveform so that it appear as if spoken by someone else (target) than the original speaker (source). The voice conversion challenge (VCC) series aims to advance and compare different methods to approach the core VC technology using a common dataset, metrics and baseline systems provided by the organizers. Rather than focusing on developing the best performing system, the core motivation of the VCC series has always been to provide researchers with information about which methods are currently state-of-the-art, through reproducible systems and experiments.


The latest VCC advanced the application to singing voices with the singing voice conversion challenge (SVCC), where the top systems showed an impressive performance in naturalness. However, similarity scores were not as high as expected, due to the fact that singing voices are much complex to evaluate due to different singing styles that can be sung by the same singer.


SVCC 2025

With this motivation in mind, we are pleased to announce SVCC 2025 and aim to further advance the state-of-the-art in this research field. This year, we focus on singing style conversion (SSC). Compared to singing voice conversion (SVC) which only converts singer identity, SSC focuses on converting the how the singer sings the song and changes the singing style, without changing the linguistic contents and identity of the source singer. SSC is more challenging than VC and SVC, as there are various ways to sing a song in different styles, but still need to follow music theory such that the converted singing voice is still pleasant to listen to. From the research community point of view, SSC is the intersection of speech processing and music processing. SSC is a new, novel, and challenging research field, and we hope to attract attention from researchers in both communities to facilitate interdisciplinary research.

How to Participate

Registration is free. To participate, please complete the registration form:

We will only send the training data and instructions to the registered participants.

Please make sure to read the challenge rules before participating.

Challenge Tasks

Task 1: In-Domain Singing Style Conversion

  • Convert source singer A's singing style from style 1 to style 2
  • Source singer A is in the training dataset
  • Reference singing voice in style 2 from singer A is provided in the training dataset

Task 2: Zero-Shot Singing Style Conversion

  • Convert source singer B's singing style from style 1 to style 2
  • Source singer B is NOT in the training dataset
  • Reference singing voice in style 2 from singer B will not be provided
  • Participants would need to use a reference singing voice in style 2 from a different singer in the training dataset to complete the task
Task Source Reference Conversion
Task 1 Singer A, in style 1 Singer A, in style 2 Singer A, in style 2
Task 2 Singer B, in style 1 Any singer except B, in style 2 Singer B, in style 2

Training data

  • Contains training data of Task 1 singer A (~4.5 hours, in all 7 singing styles).
  • No training data of the Task 2 singer B will be provided.
  • Other singers in the training dataset (~70 hours, in all 7 singing styles) will be provided as additional data.
  • It will be up to participants how they will choose the target reference style.
  • Datasets include waveform files and annotated labels (aligned phoneme and MIDI, global and local style labels, transcriptions).
  • The SVCC 2025 dataset is a subset of the GTSinger dataset. Thus, participants will NOT be allowed to use the GTSinger dataset for training. Please refer to the challenge rules for more details.

Test set details

  • The participants will be provided with a test set, with each phrase containing 4 source singing styles.
  • Participants will then have to convert each phrase into the specified singing styles for each phrase.
  • Participants will only be provided with waveform files and NOT the annotated labels.

Provided singing styles

  • The challenge will focus on 7 singing styles:
  • Breathy, Falsetto, Mixed Voice, Pharyngeal, Glissando, Vibrato, and a Control style.

Subjective evaluation details

  • Naturalness: 5-scale mean opinion score.
  • Singer identity similarity: 4-scale AB test. Please refer to the SVCC 2023 paper for more details.
  • Singer style similarity: 4-scale XAB test. Please refer to the Baseline 1 paper for more details.

Baseline Systems

To facilitate the challenge, we will be providing participants with two baseline systems with completely open-sourced codes:

Baseline 1: Serenade [Paper] [Open-sourced code]

Baseline 2: Vevo 1.5 [Original paper] [Technical Blog] [Open-sourced code]

Timeline

Apr. 7th, 2025: Challenge tasks and description released

Apr. 14th, 2025: Baseline 2 code and technical paper for SVCC released

Apr. 28th, 2025: Training data release

Jun. 23rd, 2025: Evaluation data release

Jun. 30th, 2025: Converted waveforms submission deadline

Jul. 14th, 2025: System description submission deadline

Aug. 25th, 2025: Results notification

To be confirmed: Conference workshop paper submission deadline

Organizers

Lester Phillip Violeta, Wen-Chin Huang, and Tomoki Toda (Nagoya University, Japan)

Xueyao Zhang, Zhizheng Wu (The Chinese University of Hong Kong (Shenzhen), China)

Jiatong Shi (Carnegie Mellon University, USA)

Yusuke Yasuda (National Institute of Informatics, Japan)

Previous VCCs

SVCC 2023: Challenge website

VCC 2020: Challenge website

VCC 2018: Challenge website

VCC 2016: Challenge website

Contact

svcc2025__at__vc-challenge.org