LLMs hunger for data and memory
LLM-based chatbots like ChatGPT and Claude are very data and memory-intensive. These models typically require major amounts of memory to work. Such requirements can be a challenge for devices like iPhones that have limited memory capacity.
To tackle this issue, Apple researchers have developed a new technique that uses flash memory to store the AI model’s data. This is the same memory where apps and photos are also stored.
How Apple is planning to run LLMs on iPhones
In a new research paper titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory” (spotted first by MacRumors), the authors have claimed that flash storage is more abundant in mobile devices than the RAM traditionally used for running LLMs. Their method bypasses the limitation using two key techniques that minimises data transfer and maximise flash memory throughput. These methods are:
Windowing: This is like a recycling method. Instead of loading new data every time, the AI model will reuse some of the data it has already processed. This reduces the requirement for constant memory fetching and makes the process faster and smoother.
Row-Column Bundling: This technique is similar to reading a book in larger chunks instead of one word at a time. It can group data more efficiently that can be read faster from the flash memory. This method also speeds up the AI’s ability to understand and generate language.
The paper suggests that the combination of these methods will allow AI models to run up to twice the size of the iPhone‘s available memory. This method is expected to increase speed on standard processors (CPUs) by 4-5 times and 20-25 times faster on graphics processors (GPUs).
The authors note: “This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility.”
How this method will improve AI features on iPhones
The latest breakthrough in AI efficiency will open up new possibilities for future iPhones. This includes more advanced Siri capabilities, real-time language translation and other AI-driven features in photography and augmented reality. The technology will also help iPhones to run complex AI assistants and chatbots on-device which Apple is already said to be working on.
In February, Apple held an AI summit and briefed employees on its large language model. Eventually, Apple’s work on generative AI may be used into its Siri voice assistant.
Apple is developing a smarter version of Siri that’s deeply integrated with AI, reports Bloomberg. The company is planning to update the way Siri interacts with the Messages app. This allows users to field complex questions and auto-complete sentences more effectively. Moreover, Apple is also reportedly planning to add AI to as many apps as possible.
The iPhone maker is also reportedly developing its own generative AI model called “Ajax”. Ajax operates on 200 billion parameters which suggests a high level of complexity and capability in language understanding and generation.
Internally known as “Apple GPT,” Ajax is aimed at unifying machine learning development across the company. This suggests a broader strategy of the company to integrate AI more deeply into Apple’s ecosystem.
Rumours also suggest that Apple may include some kind of generative AI feature with iOS 18 which will be available on the iPhone and iPad around late 2024. In October, analyst Jeff Pu said that Apple is building a few hundred AI servers in 2023 and more are expected to arrive by 2024. Apple is likely to offer a combination of cloud-based AI and AI with on-device processing.