Summary of VCC 2016

Details of VCC 2016 are described in the following papers:

Overview of VCC 2016
- T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi, "The Voice Conversion Challenge 2016," Proc. INTERSPEECH, pp. 1632-1636, 2016.
  [Paper] and [Slides]
Analysis of VCC 2016 results
- M. Wester, Z. Wu, J. Yamagishi, "Analysis of the Voice Conversion Challenge 2016 Evaluation Results," Proc. INTERSPEECH, pp. 1637-1641, 2016.
  [Paper] and [Slides]
- M. Wester, Z. Wu, J. Yamagishi, "Multidimensional scaling of systems in the Voice Conversion Challenge 2016," Proc. SSW9, pp. 40-45, 2016.
  [Paper]

The following materials are freely available:

Dataset & results (raw scores) of listening tests
http://dx.doi.org/10.7488/ds/1430

VCC 2016 Dataset

VCC 2016 Dataset was developed using DAPS (Data And Production Speech).

Select 10 speakers including 5 female and 5 male speakers.
Manually segmented into 216 utterances in each speaker
Down-sampled to 16 kHz
Freely available: http://dx.doi.org/10.7488/ds/1430

Experimental conditions of VCC 2016 is shown here.

Source speakers: 3 females and 2 males
Target speakers: 2 females and 3 males
Training data: 162 utterance pairs of the source and target speakers.
Evaluation data: Remaining 54 utterances

VC Systems

Baseline System

The baseline system was developed using freely available software: VCtools within FestVox.

Analysis methods
- F0 extraction with Edinburgh Speech Tools (EST)
- Spectral analysis with Signal Processing Toolkit (SPTK)
Converted parameters and conversion methods
- Mel-cepstrum (MCEP):
  - Joint p.d.f. modeling w/ Gaussian mixture model (GMM) (64 mix)
  - Trajectory-wise conversion (MLPG) using global variance (GV)
- Log-scaled F0 (LF0):
  - Global linear transformation w/ mean & variance (M&V)
Synthesis methods
- Simple pulse/noise excitation
- Mel-log spectrum approximate (MLSA) filter

Submitted VC Systems

17 teams developed their own VC systems as shown here.

(NOTE: This table may not be correct, and some parts would be updated.)

Voice Samples

You can listen to several voice samples converted by the individual systems.
If you want to listen to more samples, such as intra-gender conversion, please go to this page.
(NOTE: It would take some to open the page due to many voice samples.)

Examples of male-to-female conversion:

Source

Target

Baseline

System A

System B

System C

System D

System E

System F

System G

System H

System I

System J

System K

System L

System M

System N

System O

System P

System Q

Overall Resuts of Listening Tests

The results of listening tests are shown below.

Most systems outperfom the baseline system.
Performance of the VC systems: MOS < 3.5 & correct rate < 75%
There is a large gap between the target natural voices and converted voices.

Towards Next Challenge

We plan to improve the baseline system.
Our immediate goal in this task will be to develop the VC system to achieve
both MOS > 4 and correct rate > 80%.

[back to Voice Conversion Challenge page]
Contact information: vcc2016__at__vc-challenge.org

Source
Target
Baseline
System A
System B
System C
System D
System E
System F
System G
System H
System I
System J
System K
System L
System M
System N
System O
System P
System Q