MSpeC-Net

Abstract

In this paper, we present a multi-domain speech conversion technique by proposing a Multi-domain Speech Conversion Network (MSpeC-Net) architecture for solving the less-explored area of Non-Audible Murmur-to-SPeeCH (NAM2-SPCH) conversion. The murmur produced by the speaker and captured by the NAM microphone undergoes speech quality degradation. Hence, NAM2SPCH conversion becomes a necessary and challenging task for improving the intelligibility of NAM signal. MSpeC-Net contains three domain-specific autoencoders. The multiple encoder-decoders are aligned using latent consistency loss in such a way that the desired conversion is achieved by using the source encoder and target decoder only. We have performed zero-pair NAM2SPCH conversion using the interaction between source encoder and the target decoder. We evaluated our proposed method using both objective and subjective evaluations. With a Mean Opinion Score of 3.26 and 3.12 on an average in a direct NAM2SPCH, and an indirect NAM2SPCH (i.e., NAM-to-whisper-to-speech) conversion, respectively. MSpeC-Net achieves the perceptually significant improvement for NAM2SPCH conversion system.

Index	Conversion Type	Input	MMSE-DiscoGAN (Baseline)	MSpeC-Net (Proposed)
(1)	NAM2WHSP

(2)	WHSP2SPCH

(3)	NAM2SPCH

Mspec-Net : Multi-Domain Speech Conversion Network

Harshit Malaviya, Jui Shah, Maitreya Patel, Jalansh Munshi, Hemant A. Patil

Abstract

Results

Links

MSpeC-Net

GitHub

Citation

H. Malaviya, J. Shah, M. Patel, J. Munshi and H. A. Patil, "Mspec-Net : Multi-Domain Speech Conversion Network," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 7764-7768.

BibTex