基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
Compared with phase spectrum,magnitude spectrum can represent most speech information,hence many speech processing tasks pay much attention on manipulating mag-nitude spectrum and use the imperfect vocoder parameters or mismatched phase spectrum to synthesize the waveform,which leads to an obvious distortion of speech quality.To address this problem,a modified version of WaveNet model fused with phase information is proposed to syn-thesize the speech with higher quality.In the WaveNet model,the original or processed phase spectrum of speech and the enhanced magnitude spectrum are concatenated as the condition input,and then the predicted speech waveform is generated directly from this input,which is a kind of fusion feature.The proposed method can realize the effective utilization of the phase information and is verified in two tasks including voice conversion (VC) and bone-conducted speech enhancement (BSE).Two kinds of phase spectrum,the modified group delay (MGD)spectrum and the instantaneous frequency deviation spectrum,are compared comprehensively in the simulation experiments,and the influence of the fusion feature on the bandwidth exten-sion WaveNet model and the teacher-student WaveNet model is also explored.In VC experi-ments,the A/B test shows the generated speech using the teacher-student WaveNet model is much better than using the STRAIGHT vocoder.In BSE experiments,the results show that,using the bandwidth extension WaveNet model via the feature fused with MGD spectrum,the mean opinion score (MOS) of the enhanced speech increases by 54.3% compared with the orig-inal bone-conducted speech.All the results demonstrate that the phase-fused condition input can supplement single magnitude spectrum efficiently and help the WaveNet vocoder achieve promising improvement on the quality of the synthesized speech.
推荐文章
Spatial analysis of carbon storage density of mid-subtropical forests using geostatistics: a case st
Carbon storage density
Geostatistics
Mid-subtropical forests
Spatial autocorrelation
Spatial heterogeneity
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 Improving the performance of speech waveform synthesis using WaveNet fused with phase information
来源期刊 声学学报(英文版) 学科
关键词
年,卷(期) 2022,(1) 所属期刊栏目
研究方向 页码范围 1-19
页数 19页 分类号
字数 语种 英文
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2022(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
引文网络交叉学科
相关学者/机构
期刊影响力
声学学报(英文版)
季刊
0217-9776
11-2066/O3
16开
北京市
1981
eng
出版文献量(篇)
832
总下载数(次)
0
论文1v1指导