Mspec-Net : Multi-Domain Speech Conversion Network

Harshit Malaviya, Jui Shah, Maitreya Patel, Jalansh Munshi, Hemant A. Patil


In this paper, we present a multi-domain speech conversion technique by proposing a Multi-domain Speech Conversion Network (MSpeC-Net) architecture for solving the less-explored area of Non-Audible Murmur-to-SPeeCH (NAM2-SPCH) conversion. The murmur produced by the speaker and captured by the NAM microphone undergoes speech quality degradation. Hence, NAM2SPCH conversion becomes a necessary and challenging task for improving the intelligibility of NAM signal. MSpeC-Net contains three domain-specific autoencoders. The multiple encoder-decoders are aligned using latent consistency loss in such a way that the desired conversion is achieved by using the source encoder and target decoder only. We have performed zero-pair NAM2SPCH conversion using the interaction between source encoder and the target decoder. We evaluated our proposed method using both objective and subjective evaluations. With a Mean Opinion Score of 3.26 and 3.12 on an average in a direct NAM2SPCH, and an indirect NAM2SPCH (i.e., NAM-to-whisper-to-speech) conversion, respectively. MSpeC-Net achieves the perceptually significant improvement for NAM2SPCH conversion system.


Index Conversion Type Input MMSE-DiscoGAN (Baseline) MSpeC-Net (Proposed)



H. Malaviya, J. Shah, M. Patel, J. Munshi and H. A. Patil, "Mspec-Net : Multi-Domain Speech Conversion Network," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 7764-7768.
            author={H. {Malaviya} and J. {Shah} and M. {Patel} and J. {Munshi} and H. A. {Patil}},
            booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
            title={Mspec-Net : Multi-Domain Speech Conversion Network},