The Technology section is published with the support of Favbet Tech
OpenAI presented the results of the Voice Engine, a tool for realistic voice synthesis based on A 15-second sample and text that took about two years to develop. But there is no public access to it – due to the company's obvious security concerns.
«We hope to start a dialogue about the responsible use of synthetic voices and how society can adapt to these new possibilities. Based on these conversations and the results of these small tests, we will make a more informed decision about whether and how to deploy this technology at scale,” OpenAI said in a blog post.
The generative AI model that powers the Voice Engine has been hiding in plain sight for some time. It underlies the voice and read-aloud capabilities of ChatGPT, as well as the pre-configured voices available in the OpenAI Text-to-Language API. Spotify has also been using it since early September to dub podcasts in different languages.
The company sees several ways to use the technology: helping those who for some reason cannot read, translation, providing voice services to remote communities, supporting people with voice disorders and helping with voice restoration. Application examples with samples in several languages are also presented in the blog.
TechCrunch asked company spokesman Jeff Harris what materials Voice Engine was trained on. He responded that the Voice Engine model was trained on a mixture of licensed and publicly available data. The details of training AI models can represent both a competitive advantage and a source of legal problems, so the lack of detail is not surprising. Voice Engine uses user data very carefully:
“We take a small sample of audio and text and create a realistic speech that matches the original speaker,” Harris says. — Used audio is deleted after the request is completed.”
According to the site, the price of the future service will be “biting”. OpenAI has removed the price of using Voice Engine from marketing materials, but documents reviewed by TechCrunch list a cost of $15 per million characters, or ~162,500 words in English. It's a little more than Dickens' Oliver Twist. This means approximately 18 hours of audio, which is slightly less than $1 per hour.
QA Manual Course (PZ manual testing) from Powercode academy. Learn how to find solutions and control the content of websites and add-ons. Sign up for a course
The cost is less than one of its most popular competitors, ElevenLabs, at $11 per 100,000 characters per month. Interestingly, the HD quality option costs twice as much, but an OpenAI spokesperson told TechCrunch that there is no difference between HD and non-HD voices – that can be interpreted either way. The Voice Engine also does not offer controls for tone, pitch, or other characteristics of the voice.
Voice actor rates on ZipRecruiter range from $12 to $79 per hour – much more expensive than Voice Engine. Actors with agents will receive much higher pay. The problem of deepfakes also arises. Therefore, the company is moving very carefully for now, as with the given use cases.
The Technology section is published with the support of Favbet Tech
Favbet Tech is IT a company with 100% Ukrainian DNA, which creates perfect services for iGaming and Betting using advanced technologies and provides access to them. Favbet Tech develops innovative software through a complex multi-component platform that can withstand enormous loads and create a unique experience for players. The IT company is part of the FAVBET group of companies.
The competition for ITS authors continues. Write an article about the development of games, gaming and gaming devices and win a professional gaming wheel Logitech G923 Racing Wheel, or one of the low-profile gaming keyboards Logitech G815 LIGHTSYNC RGB Mechanical Gaming Keyboard!