Google DeepMind and Stanford have developed an AI data verification system that corrects 76% of false answers

by alex

The Technology section is published with the support of Favbet Tech

Google DeepMind и Стэнфорд разработали систему проверки данных ИИ — исправляет 76% ложных ответов

Google DeepMind и Стэнфорд разработали систему проверки данных ИИ — исправляет 76% ложных ответов

One of the biggest disadvantages of artificial intelligence-based chatbots is the so-called “ hallucinations”, when the AI ​​invents invalid information, that is, it actually lies. Some experts say this is one of the interesting features of AI, and it could be useful for generative models that create images and videos. But not for speech models that provide answers to questions from users who expect accurate data.

Google DeepMind Lab and Stanford University seem to have found a workaround to solve the problem. Researchers have developed a verification system for large artificial intelligence language models: the Search-Augmented Factuality Evaluator, or SAFE, checks long answers generated by AI chatbots. Their research is available as a preprint on arXiv, along with all experimental code and datasets.

The system analyzes, processes and evaluates responses in four steps to check their accuracy and appropriateness. SAFE first breaks down the answer into individual facts, reviews them, and compares them with Google search results. The system also checks the relevance of individual facts to the query provided.

To evaluate the performance of SAFE, the researchers created LongFact, a dataset of approximately 16,000 facts. They then tested the system on 13 large language models from four different families (Claude, Gemini, GPT, PaLM-2). In 72% of cases, SAFE gave the same results as human testing. In cases of disagreement with the AI ​​results, SAFE was correct 76% of the time.

READ
Chatbot for spies: Microsoft launched an artificial intelligence model without an Internet connection

Researchers claim that using SAFE is 20 times cheaper than human testing. Thus, the solution turned out to be economically viable and scalable. Existing approaches to assessing the appropriateness of model-generated content typically rely on direct human evaluation. Although valuable, this process is limited by the subjectivity and variability of human judgment and the scalability issues of applying human labor to large data sets.

Fullstack Web Development course from Mate academy. Become a universal developer who can create web solutions from scratch. Find out about the course

Google DeepMind и Стэнфорд разработали систему проверки данных ИИ — исправляет 76% ложных ответов

Google DeepMind и Стэнфорд разработали систему проверки данных ИИ — исправляет 76% ложных ответов

Favbet Tech is IT a company with 100% Ukrainian DNA, which creates perfect services for iGaming and Betting using advanced technologies and provides access to them. Favbet Tech develops innovative software through a complex multi-component platform that can withstand enormous loads and create a unique experience for players. The IT company is part of the FAVBET group of companies.

The competition for ITS authors continues. Write an article about the development of games, gaming and gaming devices and win a professional gaming wheel Logitech G923 Racing Wheel, or one of the low-profile gaming keyboards Logitech G815 LIGHTSYNC RGB Mechanical Gaming Keyboard!

You may also like

Leave a Comment