With the Apple AI focus set to continue right through to iOS 18 and beyond, and the company continues to work on creating the tools and features required to make that happen. One example of that is a new research paper that details Ferret-UI, a multimodal large language model that can make sense of what's on a user's display.
Large language models are what power chatbots like ChatGPT while a multimodal LLM, or MLLM, can make more sense of things like images and videos. Now, Apple's researchers have unveiled Ferret-UI, an MLLM that potentially can understand what's on an iPhone's display.
The uses for that could be many, and Apple doesn't go into too many details. But accessibility seems like an obvious use case, as does app testing among developers.
The AI is coming
The research paper hints at exactly that, too, although Ferret-UI could be used in other ways as well.
Detailing potential use cases, the paper says that Ferret-UI could be a "valuable building block for accessibility, multi-step UI navigation, app testing, usability studies, and many others."
Normally, making sense of an iPhone's display would be problematic due to the shape and size of its various components. Buttons are smaller than what these AI systems normally work with, and tapping one of those buttons changes everything. Ferret-UI can work with those new complications, but it remains to be seen how Apple implements it.
One potential use case is iOS 18, the AI-laden update that is expected to be announced at WWDC in June. Ferret-UI could help Siri make sense of an app's interface and potentially help users figure out how to carry out certain tasks, for example — something that could be extremely useful as interfaces become ever more sparse.