DOOM is once again becoming a field for experimentation. This time the shooter was not launched on a toaster, but the AI was allowed to have fun in the levels
DOOM is not only being launched on all sorts of platforms (gaming and not), but they are also simply experimenting with the legendary shooter. One researcher decided to try planting GPT-4 to play DOOM. Adrian de Wynter tested the capabilities of a language model (AI) in a video game.
He did not use OpenAI GPT-4, which cannot run DOOM because Due to limitations on the amount of input data, the choice fell on the multimodal version GPT-4V, which is capable of receiving images as input data.
The researcher did not provide any special training to the AI to play DOOM, but still had to do some work. He developed the Vision component to take screenshots from the game engine and return structural descriptions of the game state and “combined this with an agent model that calls GPT-4 to make decisions based on visual input and previous history. The agent model was told to translate its responses into commands that were meaningful to the game engine.”
That is, the input is a picture -> it “turns” into a text description of what is happening in front of the player – > GPT-4 analyzes and makes decisions -> this decision is translated into a command that is sent to DOOM.
A similar GPT-4 based design is capable of moving around the level and opening doors, shooting and fighting enemies. However, this is still not a “full-fledged player” – there are certain disadvantages:
If the enemy leaves the screen, then his existence is “forgotten” (while the enemy remains alive and can continue to cause damage).
The AI’s spatial orientation is not very good, sometimes GPT-4 got stuck
GPT-4 also has problems with reasoning – when the researcher asked to explain the reason for making a decision, the AI’s explanations were poor and included “hallucinations” (incorrect information).
This experiment is alarming, with the researcher writing: “Ethically, it is quite alarming how easy it was for (a) me to create code to get the model to shoot someone, and (b) for the model to accurately shot someone without thinking about instructions.”