OpenAI destroyed 100,000 books used to train GPT-3. Those involved also disappeared somewhere

The Technology section is published with the support of Favbet Tech

OpenAI уничтожила 100 000 книг, по которым тренировали GPT-3. Причастные тоже куда-то исчезли

OpenAI deleted two huge data sets “books1” and “books2” , which were used to train the GPT-3 model.

This was reported by Business Insider, citing materials from the Authors Guild lawsuit.

The essence of the claim

Authors Guild lawyers said the GPT-3 datasets likely contained “more than 100,000 published books”. Thus, OpenAI used copyrighted materials to train AI models.

Help. Authors Guild is the oldest (established in 1912) and most authoritative professional organization of writers in the United States. Dedicated to protecting freedom of speech and copyright.

For months, the Authors Guild has been asking OpenAI for information about the datasets it used. At first the company refused, citing confidentiality provisions But then it turned out that it had deleted all copies of the data.

High-quality training data is an important part of powerful AI models. To build these models, OpenAI and other companies use data from the Internet, including books.

Online course “Business English” from Laba. Learn the basics of grammar, vocabulary and vocabulary. Learn English in spontaneous conversations with colleagues and clients. Upgrade them to B1 – to develop your career in business. Come to the course

Many of the companies that created this information want to get paid to provide information to these new AI products. Tech companies don't want to be forced to pay. This dispute is currently being resolved in court based on several lawsuits.

100,000 books – 16% of GPT-3 training data

In a 2020 whitepaper, OpenAI described the books1 and books2 datasets as a “corpus of books from the Internet” and stated that overall they represented 16% of the training data that was used to create GPT-3. < /p>

The document also states that “books1” and “books2” together contained 67 billion tokens, or approximately 50 billion words.

READ

At Computex 2024, NVIDIA CEO Jensen Huang signed the PNY RTX 4070 SUPER video card - it is green and suitable for SFF-Ready

OpenAI stopped using “books1” and “books2” for training models at the end of 2021 They were removed in mid-2022 due to “unsuitability” for use.”

The documents also indicate that two of the researchers who created the “books1” and “books2” datasets are no longer employed by OpenAI. OpenAI refuses to disclose information about them, although the Authors Guild insists on it.

OpenAI asked the court to preserve the names of employees, as well as information about data sets.

“The models that use ChatGPT and our API today were not built using these datasets,” OpenAI said in a statement Tuesday.

Recall that there was a story when AI researcher and ex-Amazon manager Vivian Ghaderi accused her former employer of violating copyright requirements.

In March, her team director set out on a mission to find reasons why Amazon wasn't meeting its Alexa search quality goals. In the conversation, he recommended ignoring the copyright policy to improve results. The director asked to pay attention to competitors with the words “everyone does it.”

The Technology section is published with the support of Favbet Tech

OpenAI уничтожила 100 000 книг, по которым тренировали GPT-3. Причастные тоже куда-то исчезли

Favbet Tech is IT a company with 100% Ukrainian DNA, which creates perfect services for iGaming and Betting using advanced technologies and provides access to them. Favbet Tech develops innovative software through a complex multi-component platform that can withstand enormous loads and create a unique experience for players. The IT company is part of the FAVBET group of companies.

No more $20k markups. The latest Toyota Land…

The domestic airliner MS-21 with Russian PD-14 engines…

It will go where Hummer and Land Cruisers…

The most popular electric car in Russia: 200,000…

Tesla is accused of creating a monopoly on…

Bioware veteran has no doubt that Dragon Age:…

Black Myth: Wukong has conquered Steam wishlists. Interest…

The former head of PlayStation gave advice on…

Kotaku's editor-in-chief dedicated her Shadow of the Erdtree…

The PS Store is currently on sale with…

The F1 Arcade restaurant has opened in Boston,…

Not only Fallout: 7 TV series based on…

Apple allows retro game emulators and introduces new…

Hacker attack on Activision users detected

Fully AI-generated games are 10 years away, says…

Up to 100 messages as one – Viber…

Most VPN programs do not work on Copilot+…

First tests of Copilot+ PC ASUS Vivobook S…

Artificial intelligence can detect Parkinson's disease with 100%…

The European Union wants to scan all messages…