Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.
A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.
TAO ZhiZHAO HemingTAN XuedanGU JihuaZHANG XiaojunWU Di
In order to increase short time whispered speaker recognition rate in variable chan- nel conditions, the hybrid compensation in model and feature domains was proposed. This method is based on joint factor analysis in training model stage. It extracts speaker factor and eliminates channel factor by estimating training speech speaker and channel spaces. Then in the test stage, the test speech channel factor is projected into feature space to engage in feature compensation, so it can remove channel information both in model and feature domains in order to improve recognition rate. The experiment result shows that the hybrid compensation can obtain the similar recognition rate in the three different training channel conditions and this method is more effective than joint factor analysis in the test of short whispered speech.
Whispered speech enhancement using auditory masking model in modified Mel- domain and Speech Absence Probability (SAP) was proposed. In light of the phonation char- acteristic of whisper, we modify the Mel-frequency Scaling model. Whispered speech is filtered by the proposed model. Meanwhile, the value of masking threshold for each frequency band is dynamically determined by speech absence probability. Then whispered speech enhancement is conducted by adaptively rectifying the spectrum subtraction coefficients using different masking threshold values. Results of objective and subjective tests on the enhanced whispered signal show that compared with other methods; the proposed method can enhance whispered signal with better subjective auditory quality and less distortion by reducing the music noise and background noise under the masking threshold value.
TAO Zhi~(1,2) ZHAO Heming~2 WU Di~1 CHEN Daqing~1 ZHANG Xiaojun~1 (1 School of Physical Science and Technology,Soochow University Suzhou 215006) (2 School of Electronics and Information Engineering,Soochow University Suzhou 215006)