The new method is called ReBRAC
Tinkoff Team shared new achievements of scientists from Tinkoff Research. As the press service reported, in the artificial intelligence research laboratory Tinkoff Research created the most effective algorithm among world analogues for training and adapting artificial intelligence.
Tinkoff Research said:
The new method, called ReBRAC (Revisited Behavior Regularized Actor Critic — revised actor-critic with controlled behavior), trains AI four times faster and 40% better than world analogues in the field of reinforcement learning ( Reinforcement Learning, RL), adapting it to new conditions on the go.
The essence of the discovery is that scientists from Tinkoff Research identified four components that were presented in algorithms in recent years, but were considered minor and were not subjected to detailed analysis:
- Depth of neural networks. Increasing the network's depth helps it better understand complex patterns in data
- Regularization of the actor and critic. AI agents have two components: an “actor,” who takes actions, and a “critic,” who evaluates those actions. Scientists used joint regularization of both components so that the actor avoids unwanted actions and the critic evaluates them more effectively. Previously it was not clear how to combine both approaches with the greatest efficiency
- Increasing the effective planning horizon — allows the model to balance between the short- and long-term aspects of a problem and improves its decision-making ability
- Using layer normalization (LayerNorm) — stabilizes the learning process of neural networks
These components were integrated into the predecessor algorithm BRAC (Behavior Regularized Actor Critic & mdash; actor-critic with controlled behavior) of 2019 and a study was conducted, varying each of them in turn. It turns out that the right combination of these components gives even this old approach the highest performance among the best analogues today. The modified algorithm is called ReBRAC.
Testing on robotic simulators showed that the algorithm trains AI four times faster and is 40% better than all existing ones in offline benchmarks. Previously, the leadership belonged to the SAC-RND algorithm, also created by scientists from Tinkoff Research.