Baidu's neural network can already simulate your voice
Baidu Research is developing the Deep Voice neural network, it simulates people's voices. To work, a very short recording of the original voice is enough.
You can listen to examples of voices right here. The first record is a sample of a real human voice. Other voices are created by a neural network based on it. You can hear how the quality improves as new samples are used.
In just a year of working on the neural network, the company has made significant progress in its research. If in 2017 for this "scoring" it took 30 minutes of voice recording rights, now for the operation of the neural network you need only a few seconds of the source code. At the same time, the speaker's emphasis changes: they recorded an Englishwoman or an Englishman, and if necessary made them Americans.
As they say in Baidu, the main goal of this research is quite simple: scientists want to prove that machines, like people, are able to work with a limited set of data.
What for?
The answer is "YES YOU THAT": to imitate the human voice.
Other answers, in fact, are harder to find. But, for example, we have a video where Stephen Hawking is still able to do it, he says. We give this record to the study of the neural network, and the computer of the scientist will now be sounded not by the voice of the robot, but by the voice of Stephen himself.
Also, such neural networks will be useful for working on the scoring of large data sets. If the professonal speaker is entrusted with voice acting, then the maximum that he can squeeze out of his ligaments - an hour 3.
At the same time, the working material there may be 10-20 minutes less. The same neural network will give out the finished array much faster. And without mistakes! In general, someone may soon lose his job ...
Ethics issues
More difficult things will be from an ethical point of view. For example, thanks to such neural networks, there may be many more albums by Amy Winehouse, Tupac, or even Yegor Letov's solo album. Someone might like these ideas, but someone will not be called to anything but righteous anger. And, to admit, here they can be understood.
Who else deals with this issue?
In November 2016, Adobe introduced its project VoCo. The presentation showed that the tool can read the text and sounds quite realistic. Here show how it works.
Since then, there have been no news about VoCo, except that the authors faced the issue of the ethics of creating and using such audio recordings. Apparently, in November 2016, they did not even suspect that in a little more than a year neural networks would be able to replace faces better than Hollywood artists.
Voices, voiced by DeepVoice, now sound quite mechanically. But think that over this technology work a little more than a year. I do not think that after a few years we will be able to tell what is sounded by a person and what is a machine.