Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Most of the existing interpreting systems used speech recognition (transcription of sound recording into text), machine translation and speech synthesis.
Translatotron works differently. It's a system for direct translation of the soundtrack into sound recording without intermediate steps. Such a solution is faster, less error-making (because the multi-step error does not summarize), allows the speaker's voice to be preserved, and finds better words with words that do not need a translation (own names).
Source of shared Link