37
And how much is needed for GPT-5?
We recently reported that companies building large generative language models are faced with the problem of not having enough quality data to train their AI. OpenAI reportedly partially solved the problem thanks to YouTube.
Also to train GPT-4, the company created the Whisper audio transcription model and eventually translated more than a million hours of YouTube videos into text. The New York Times says that OperAI was well aware that this was a legal gray area, but went ahead with it anyway. It is quite possible that the company is now using the same method to train GPT-5.
The source says that already for GPT-4, OpenAI has actually exhausted all reserves of high-quality training data, and even in 2021.