mandarin-tts

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder, with biaobei and aishell3 datasets

View on GitHub

Audio samples

Below are the audio samples from step 300k, pinyin + hanzi + unet(as postnet).

It may take some time to load all audios in browser.

multi-speaker samples

Samples from training loop

Synthesized Groudtruth

Novel sampels

The following samples are generated by running

./scripts/hz_synth.sh 1.0 500000

NOTE: there are many failure cases that are not shown here.

常规句子 (注意字与字之前的音隔)

Text Normal Fast slow
1
2
3
4
5
6

饶口令

Text Normal Fast slow
1
2
9

儿化音

在汉字转拼音时自动识别儿化音

Text Normal Fast slow
1
2
9

The following is obsolete. 以下例子是旧版本,仅供参考

Novel synthsized samples

Input text

  1. 黑化肥发灰会挥发,灰化肥挥发会发黑
  2. 红鲤鱼绿鲤鱼与驴
  3. 前方路口左转,然后在下一个路口右转
  4. 请输入您的卡号.您输入的卡号是六二二六,三八七六,零三四七,六九一五
  5. 天青色等烟雨,而我在等你,炊烟袅袅升起,隔江千万里
  6. 我听不清你在说什么,请大声一点
  7. 如果觉得这个项目好,请手动加个星吧,感谢
  8. 小毛豆你好,很高兴认识你.寒假马上要结束了.我觉得你最近有进步,继续加油.晚上要早点睡觉,因为明天要开学了.
  9. 有个小孩叫小杜,上街打醋又买布.买了布打了醋,回头看见鹰抓兔.放下布搁下醋,上前去追鹰和兔.飞了鹰跑了兔,洒了醋湿了布.

Audio samples

To generate audio samples, first you need to down load the checkpoint from google drive an untar it to mandarin_tts/

Then the audio samples are genreated by the following command:

./scripts/hz_synth.sh 1.0 300000
./scripts/hz_synth.sh 0.9 300000 
./scripts/hz_synth.sh 1.1 300000 

You can also use pure pinyin + unet model, as follows:

./scripts/py_synth.sh 1.0 300000 
./scripts/py_synth.sh 0.9 300000 
./scripts/py_synth.sh 1.1 300000 

For other text inputs, you can alter text in input.txt. Some hints:

Text Normal Fast slow
1
2
3
4
5
6
7
8
9