Summary of VCC 2016
Details of VCC 2016 are described in the following papers:
The following materials are freely available:
- Overview of VCC 2016
- T. Toda, L.-H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi, "The Voice Conversion Challenge 2016," Proc. INTERSPEECH, pp. 1632-1636, 2016.
[Paper] and [Slides]
- Analysis of VCC 2016 results
- M. Wester, Z. Wu, J. Yamagishi, "Analysis of the Voice Conversion Challenge 2016 Evaluation Results," Proc. INTERSPEECH, pp. 1637-1641, 2016.
[Paper] and [Slides]
- M. Wester, Z. Wu, J. Yamagishi, "Multidimensional scaling of systems in the Voice Conversion Challenge 2016," Proc. SSW9, pp. 40-45, 2016.
VCC 2016 Dataset
VCC 2016 Dataset was developed using DAPS (Data And Production Speech).
Experimental conditions of VCC 2016 is shown here.
- Select 10 speakers including 5 female and 5 male speakers.
- Manually segmented into 216 utterances in each speaker
- Down-sampled to 16 kHz
- Freely available: http://dx.doi.org/10.7488/ds/1430
- Source speakers: 3 females and 2 males
- Target speakers: 2 females and 3 males
- Training data: 162 utterance pairs of the source and target speakers.
- Evaluation data: Remaining 54 utterances
The baseline system was developed using freely available software: VCtools within FestVox.
- Analysis methods
- Converted parameters and conversion methods
- Mel-cepstrum (MCEP):
- Joint p.d.f. modeling w/ Gaussian mixture model (GMM) (64 mix)
- Trajectory-wise conversion (MLPG) using global variance (GV)
- Log-scaled F0 (LF0):
- Global linear transformation w/ mean & variance (M&V)
- Synthesis methods
- Simple pulse/noise excitation
- Mel-log spectrum approximate (MLSA) filter
Submitted VC Systems
17 teams developed their own VC systems as shown here.
(NOTE: This table may not be correct, and some parts would be updated.)
You can listen to several voice samples converted by the individual systems.
If you want to listen to more samples, such as intra-gender conversion, please go to this page.
(NOTE: It would take some to open the page due to many voice samples.)
Examples of male-to-female conversion:
Overall Resuts of Listening Tests
The results of listening tests are shown below.
- Most systems outperfom the baseline system.
- Performance of the VC systems: MOS < 3.5 & correct rate < 75%
- There is a large gap between the target natural voices and converted voices.
Towards Next Challenge
- We plan to improve the baseline system.
- Our immediate goal in this task will be to develop the VC system to achieve
both MOS > 4 and correct rate > 80%.
[back to Voice Conversion Challenge page]
Contact information: email@example.com