OpenAI transcribed over a million hours of YouTube videos to train GPT-4

by alex April 6, 2024

April 6, 2024

And how much is needed for GPT-5?

We recently reported that companies building large generative language models are faced with the problem of not having enough quality data to train their AI. OpenAI reportedly partially solved the problem thanks to YouTube.

Also to train GPT-4, the company created the Whisper audio transcription model and eventually translated more than a million hours of YouTube videos into text. The New York Times says that OperAI was well aware that this was a legal gray area, but went ahead with it anyway. It is quite possible that the company is now using the same method to train GPT-5.

The source says that already for GPT-4, OpenAI has actually exhausted all reserves of high-quality training data, and even in 2021.

READ

Samsung has announced a number of audio updates for its phones, tablets and TVs

No more $20k markups. The latest Toyota Land…

The domestic airliner MS-21 with Russian PD-14 engines…

It will go where Hummer and Land Cruisers…

The most popular electric car in Russia: 200,000…

Tesla is accused of creating a monopoly on…

Bioware veteran has no doubt that Dragon Age:…

Black Myth: Wukong has conquered Steam wishlists. Interest…

The former head of PlayStation gave advice on…

Kotaku's editor-in-chief dedicated her Shadow of the Erdtree…

The PS Store is currently on sale with…

The F1 Arcade restaurant has opened in Boston,…

Not only Fallout: 7 TV series based on…

Apple allows retro game emulators and introduces new…

Hacker attack on Activision users detected

Fully AI-generated games are 10 years away, says…

Up to 100 messages as one – Viber…

Most VPN programs do not work on Copilot+…

First tests of Copilot+ PC ASUS Vivobook S…

Artificial intelligence can detect Parkinson's disease with 100%…

The European Union wants to scan all messages…