Google DeepMind and Stanford have developed an AI data verification system that corrects 76% of false answers

by alex April 1, 2024

April 1, 2024

The Technology section is published with the support of Favbet Tech

Google DeepMind и Стэнфорд разработали систему проверки данных ИИ — исправляет 76% ложных ответов

One of the biggest disadvantages of artificial intelligence-based chatbots is the so-called “ hallucinations”, when the AI invents invalid information, that is, it actually lies. Some experts say this is one of the interesting features of AI, and it could be useful for generative models that create images and videos. But not for speech models that provide answers to questions from users who expect accurate data.

Google DeepMind Lab and Stanford University seem to have found a workaround to solve the problem. Researchers have developed a verification system for large artificial intelligence language models: the Search-Augmented Factuality Evaluator, or SAFE, checks long answers generated by AI chatbots. Their research is available as a preprint on arXiv, along with all experimental code and datasets.

The system analyzes, processes and evaluates responses in four steps to check their accuracy and appropriateness. SAFE first breaks down the answer into individual facts, reviews them, and compares them with Google search results. The system also checks the relevance of individual facts to the query provided.

To evaluate the performance of SAFE, the researchers created LongFact, a dataset of approximately 16,000 facts. They then tested the system on 13 large language models from four different families (Claude, Gemini, GPT, PaLM-2). In 72% of cases, SAFE gave the same results as human testing. In cases of disagreement with the AI results, SAFE was correct 76% of the time.

READ

Mobile Intel processors to desktop PCs: Erying introduced motherboards for the 13th generation Core HX

Researchers claim that using SAFE is 20 times cheaper than human testing. Thus, the solution turned out to be economically viable and scalable. Existing approaches to assessing the appropriateness of model-generated content typically rely on direct human evaluation. Although valuable, this process is limited by the subjectivity and variability of human judgment and the scalability issues of applying human labor to large data sets.

Fullstack Web Development course from Mate academy. Become a universal developer who can create web solutions from scratch. Find out about the course

Google DeepMind и Стэнфорд разработали систему проверки данных ИИ — исправляет 76% ложных ответов

Favbet Tech is IT a company with 100% Ukrainian DNA, which creates perfect services for iGaming and Betting using advanced technologies and provides access to them. Favbet Tech develops innovative software through a complex multi-component platform that can withstand enormous loads and create a unique experience for players. The IT company is part of the FAVBET group of companies.

The competition for ITS authors continues. Write an article about the development of games, gaming and gaming devices and win a professional gaming wheel Logitech G923 Racing Wheel, or one of the low-profile gaming keyboards Logitech G815 LIGHTSYNC RGB Mechanical Gaming Keyboard!

No more $20k markups. The latest Toyota Land…

The domestic airliner MS-21 with Russian PD-14 engines…

It will go where Hummer and Land Cruisers…

The most popular electric car in Russia: 200,000…

Tesla is accused of creating a monopoly on…

Bioware veteran has no doubt that Dragon Age:…

Black Myth: Wukong has conquered Steam wishlists. Interest…

The former head of PlayStation gave advice on…

Kotaku's editor-in-chief dedicated her Shadow of the Erdtree…

The PS Store is currently on sale with…

The F1 Arcade restaurant has opened in Boston,…

Not only Fallout: 7 TV series based on…

Apple allows retro game emulators and introduces new…

Hacker attack on Activision users detected

Fully AI-generated games are 10 years away, says…

Up to 100 messages as one – Viber…

Most VPN programs do not work on Copilot+…

First tests of Copilot+ PC ASUS Vivobook S…

Artificial intelligence can detect Parkinson's disease with 100%…

The European Union wants to scan all messages…