Google API brings LLMs to Android and iOS devices

Experimental MediaPipe LLM Inference API allows developers to run large language models ‘on-device’ across Android, iOS, and web platforms.

Editor at Large, InfoWorld |

Large language models, LLMs — Phalexaviles/Shutterstock

Google has released an experimental API that allows large language models to run fully on-device across Android, iOS, and web platforms.

Introduced March 7, the MediaPipe LLM Inference API was designed to streamline on-device LLM integration for web developers, and supports web, Android, and iOS platforms. The API provides initial support for four LLMs: Gemma, Phi 2, Falcon, and Stable LM.

Google warns that the API is experimental and still under active development, but gives researchers and developers the ability to prototype and test openly available models on-device. For Android, Google noted that production applications with LLMs can use the Gemini API or Gemini Nano on-device through Android AICore, a system-level capability introduced in Android 14 that provides Gemini-powered solutions for high-end devices including integrations with accelerators, safety filters, and LoRA adapters.

Developers can try the MediaPipe LLM Inference API via a web demo or by building sample demo apps. An official sample is available on GitHub. The API allows developers to bring LLMs on device in a few steps, using platform-specific SDKs. Through significant optimizations, the API can deliver state-of-the-art latency on-device, focusing on the CPU and GPU to support multiple platforms, Google said. The company plans to expand the API to more platforms and models in the coming year.

Next read this:

Paul Krill is an editor at large at InfoWorld, whose coverage focuses on application development.