The dataset Stable Diffusion was trained on found more than 1,000 images of child abuse

by alex December 21, 2023

December 21, 2023

The LAION-5B dataset contains more than 5 billion images and serves as a training base for many neural networks, such as Stable Diffusion.

According to a recent study by the Stanford Internet Observatory, the dataset also found thousands of child abuse snippets that could contribute to the creation of dangerous realistic content in image generators.

A spokesperson for the organization behind LAION-5B said they have a “zero tolerance policy” for illegal content and are temporarily removing the dataset to ensure its safety and re-publish it.

“This report focuses on the LAION-5B dataset as a whole. Stability’s AI models were trained on a filtered subset of it,” said Stability AI, the British artificial intelligence startup that funded and popularized Stable Diffusion.

LAION-5B or a subset of it was used to create several versions of Stable Diffusion – the newer one, Stable Diffusion 2.0, was trained on data that significantly filtered out “dangerous” material, making it much more difficult for users to create explicit images. But Stable Diffusion 1.5 does generate sexual content and is still used on the Internet.

The company spokesperson also said that Stable Diffusion 1.5 was not released by Stability AI at all, but by Runway, the AI video startup that helped create the original version of Stable Diffusion (this is a bit of a funny situation, since when releasing this version, Stability AI did not mention Runway , taking all the awards to himself).

READ

Microsoft has limited employee access to ChatGPT due to... security issues

“We've added filters to intercept dangerous requests or dangerous results, and we've also invested in content tagging features to help identify images created on our platform. These levels of mitigation make it more difficult for attackers to misuse artificial intelligence,” the company added.

LAION-5B was released in 2022 and uses raw HTML code collected by a California-based nonprofit to search for images online and link them to descriptions. For months, rumors circulated on discussion forums and social media that the dataset contained illegal images.

Vacancies

Journalist, author of stories about IT, business and people in MC.today MC.today

Content Marketing Manager YozmaTech, Viddaleno, salary 1500

Senior Strong Golang Engineer Impressit, Lviv

SEO Team Lead Zeeks

“To our knowledge, this is the first attempt to actually quantify and confirm the concerns,” said David Thiel, chief technologist at the Stanford Internet Observatory.

Researchers also previously discovered that generative AI image models can create CSAM, but by combining two “concepts” such as children and sexual activity. Thiel said new research shows that these models can generate such illegal images through some basic data.

No more $20k markups. The latest Toyota Land…

The domestic airliner MS-21 with Russian PD-14 engines…

It will go where Hummer and Land Cruisers…

The most popular electric car in Russia: 200,000…

Tesla is accused of creating a monopoly on…

Bioware veteran has no doubt that Dragon Age:…

Black Myth: Wukong has conquered Steam wishlists. Interest…

The former head of PlayStation gave advice on…

Kotaku's editor-in-chief dedicated her Shadow of the Erdtree…

The PS Store is currently on sale with…

The F1 Arcade restaurant has opened in Boston,…

Not only Fallout: 7 TV series based on…

Apple allows retro game emulators and introduces new…

Hacker attack on Activision users detected

Fully AI-generated games are 10 years away, says…

Up to 100 messages as one – Viber…

Most VPN programs do not work on Copilot+…

First tests of Copilot+ PC ASUS Vivobook S…

Artificial intelligence can detect Parkinson's disease with 100%…

The European Union wants to scan all messages…