This is the so-called language processor (LPU)
A new player has appeared in the AI chip market. Apparently with great potential. Groq has introduced its processor, which seems to be significantly superior to its competitors.
It’s worth starting with the fact that Groq (the processor has the same name) — It's not a CPU or a GPU. This is the so-called language processor (LPU). The second important feature is that this LPU is not intended for training neural networks, but for inference. And here he shows himself magnificently.
Groq LPU — It is a single-core chip based on the Tensor Stream Processor (TSP) architecture that delivers 750 TOPS at INT8 and 188 TFLOPS at FP16 with 320×320 matrix dot product multiplication in addition to 5120 vector ALUs. Judging by the data on the Internet, Groq is very far ahead of other market players whose systems rely on GPUs.
When running the Mixtral 8x7B model, the Groq LPU delivers 480 tokens per second, one of the best output rates in the industry. In the Llama 2 70B model with a context length of 4096 tokens, Groq can serve 300 tokens per second, and in the Llama 2 7B with 2048 context tokens — 750 tokens/s.
Of course, as in the case of any other new chip, the performance itself — that's half the battle. It is necessary that market players want to use the new product. But only time will reveal Groq’s prospects.