For the model to work completely autonomously
Apple is in no hurry to join the general race of chatbots and next-generation artificial intelligence, but is working in this direction. In particular, Apple is exploring the possibility of placing large language models directly on users' mobile devices.
Apple believes that this option will be better for users than online access. True, it’s not for nothing that large language models are called large — they require huge computing resources and large amounts of RAM.
The Cupertino idea is to store language models on flash memory, the volume of which is an order of magnitude larger than the amount of RAM. The method, called Windowing, also involves the model reusing some of the data it has already processed, which reduces the need for continuous data retrieval from memory and speeds up the entire process. Row-Column Bundling, in turn, is a more efficient grouping of data, which allows the artificial intelligence model to read data from flash memory faster and speeds up its training.
These methods should allow you to speed up the model up to five times when using the processor and up to 25 times — when using GPU.