OpenAI introduced Voice Engine, a model-based voice generation model – it turns out mass users have already heard it

by alex April 2, 2024

April 2, 2024

The Technology section is published with the support of Favbet Tech

OpenAI представила Voice Engine, модель генерации голоса по образцу — оказывается, ее уже слышали массовые пользователи

OpenAI presented the results of the Voice Engine, a tool for realistic voice synthesis based on A 15-second sample and text that took about two years to develop. But there is no public access to it – due to the company's obvious security concerns.

«We hope to start a dialogue about the responsible use of synthetic voices and how society can adapt to these new possibilities. Based on these conversations and the results of these small tests, we will make a more informed decision about whether and how to deploy this technology at scale,” OpenAI said in a blog post.

The generative AI model that powers the Voice Engine has been hiding in plain sight for some time. It underlies the voice and read-aloud capabilities of ChatGPT, as well as the pre-configured voices available in the OpenAI Text-to-Language API. Spotify has also been using it since early September to dub podcasts in different languages.

The company sees several ways to use the technology: helping those who for some reason cannot read, translation, providing voice services to remote communities, supporting people with voice disorders and helping with voice restoration. Application examples with samples in several languages are also presented in the blog.

TechCrunch asked company spokesman Jeff Harris what materials Voice Engine was trained on. He responded that the Voice Engine model was trained on a mixture of licensed and publicly available data. The details of training AI models can represent both a competitive advantage and a source of legal problems, so the lack of detail is not surprising. Voice Engine uses user data very carefully:

“We take a small sample of audio and text and create a realistic speech that matches the original speaker,” Harris says. — Used audio is deleted after the request is completed.”

According to the site, the price of the future service will be “biting”. OpenAI has removed the price of using Voice Engine from marketing materials, but documents reviewed by TechCrunch list a cost of $15 per million characters, or ~162,500 words in English. It's a little more than Dickens' Oliver Twist. This means approximately 18 hours of audio, which is slightly less than $1 per hour.

READ

Chinese manned spacecraft "Shenzhou-18" successfully launched

QA Manual Course (PZ manual testing) from Powercode academy. Learn how to find solutions and control the content of websites and add-ons. Sign up for a course

The cost is less than one of its most popular competitors, ElevenLabs, at $11 per 100,000 characters per month. Interestingly, the HD quality option costs twice as much, but an OpenAI spokesperson told TechCrunch that there is no difference between HD and non-HD voices – that can be interpreted either way. The Voice Engine also does not offer controls for tone, pitch, or other characteristics of the voice.

Voice actor rates on ZipRecruiter range from $12 to $79 per hour – much more expensive than Voice Engine. Actors with agents will receive much higher pay. The problem of deepfakes also arises. Therefore, the company is moving very carefully for now, as with the given use cases.

The Technology section is published with the support of Favbet Tech

Favbet Tech is IT a company with 100% Ukrainian DNA, which creates perfect services for iGaming and Betting using advanced technologies and provides access to them. Favbet Tech develops innovative software through a complex multi-component platform that can withstand enormous loads and create a unique experience for players. The IT company is part of the FAVBET group of companies.

The competition for ITS authors continues. Write an article about the development of games, gaming and gaming devices and win a professional gaming wheel Logitech G923 Racing Wheel, or one of the low-profile gaming keyboards Logitech G815 LIGHTSYNC RGB Mechanical Gaming Keyboard!

No more $20k markups. The latest Toyota Land…

The domestic airliner MS-21 with Russian PD-14 engines…

It will go where Hummer and Land Cruisers…

The most popular electric car in Russia: 200,000…

Tesla is accused of creating a monopoly on…

Bioware veteran has no doubt that Dragon Age:…

Black Myth: Wukong has conquered Steam wishlists. Interest…

The former head of PlayStation gave advice on…

Kotaku's editor-in-chief dedicated her Shadow of the Erdtree…

The PS Store is currently on sale with…

The F1 Arcade restaurant has opened in Boston,…

Not only Fallout: 7 TV series based on…

Apple allows retro game emulators and introduces new…

Hacker attack on Activision users detected

Fully AI-generated games are 10 years away, says…

Up to 100 messages as one – Viber…

Most VPN programs do not work on Copilot+…

First tests of Copilot+ PC ASUS Vivobook S…

Artificial intelligence can detect Parkinson's disease with 100%…

The European Union wants to scan all messages…