Liquid AI Releases LFM2.5: A Compact AI Model Family for Real on Device Agents

Liquid AI has introduced LFM2.5, a new generation of small foundation models built on the LFM2 architecture and focused on device and edge deployments. The model family includes LFM2.5-1.2b-Base and LFM2.5-1.2b-Instruct and spans Japanese, vision language, and audio language variants. It is issued as Open Weight on Hugging Face and is displayed through the LEAP platform.

Architecture and training recipe

LFM2.5 maintains the hybrid LFM2 architecture that was designed for fast and memory efficient inference on CPUs and NPUs and scales the data and post training pipeline. Pretraining for the 1.2 billion parameter backbone has been scaled from 10T to 28T tokens. The instruction version then receives supervised fine tuning, preference alignment, and large-scale multi-stage reinforcement learning that focuses on instruction adherence, tool use, mathematics, and knowledge reasoning.

Display of text model on a billion scale

LFM2.5-1.2b-instruct is the main general purpose text model. The Liquid AI team reports benchmark results on GPQA, MMLU Pro, IFeval, IFbench, and several function calling and coding suites. The model reaches 38.89 on GPQA and 44.35 on MMLU Pro. Competing 1B class open models such as Llama-3.2-1B Instruct and Gemma-3-1B IT score significantly lower on these metrics.

https://www.liquid.ai/blog/introduction-lfm2-5-the-next-generation-of-on-device-ai

On IFEval and IFBench, which target multi-step instruction following and function calling quality, LFM2.5-1.2B-Instruct reports 86.23 and 47.33. These values are ahead of the other 1B class baselines in the Liquid AI table above.

Japanese adapted version

LFM2.5-1.2B-JP There is a Japanese adapted text model derived from the same backbone. It targets functions such as JMMLU, M-IFEval in Japanese, and GSM8K in Japanese. This checkpoint improves upon generic instruction models on Japanese tasks and competes with or surpasses other small multilingual models such as Qwen3-1.7B, Llama 3.2-1B Instruct, and Gemma 3-1B IT on these localized benchmarks.

Vision Language Model for Multimodal Edge Workloads

LFM2.5-VL-1.6B is the updated vision language model in series. It uses LFM2.5-1.2B-Base as the backbone of the language and adds a vision tower for image understanding. The model has been tuned on a range of visual reasoning and OCR benchmarks, including MMStar, MM IFEval, BLINK, InfoVQA, OCRBench v2, RealWorldQA, MMMU, and multilingual MMBench. LFM2.5-VL-1.6b is superior to the previous LFM2-VL-1.6b on most metrics and is suitable for real-world tasks such as document understanding, user interface reading, and multi-image reasoning under edge constraints.

Audio language model with native speech generation

LFM2.5-Audio-1.5B is a native audio language model that supports both text and audio input and output. It is presented as an audio to audio model and uses an audio detokenizer that is reported to be eight times faster than previous mm based detokenizers at the same precision on restricted hardware.

The model supports two main generation modes. Interleaved generation is designed for speech-to-speech conversation agents in real time where latency dominates. Sequential generation is aimed at tasks such as automatic speech recognition and text-to-speech and allows switching generation modalities without restarting the model. The audio stack is trained with quantization aware training at low precision, which keeps metrics like STOI and UTMOS close to the full precision baseline while enabling deployment on devices with limited compute.

key takeaways

LFM2.5 is a 1.2B scale hybrid model family built on the LFM2 device optimized architecture, consisting of Base, Instruct, Japanese, Vision Language and Audio Language variants, all released as open weights on Hugging Face and LEAP.
Pretraining for LFM2.5 spans from 10T to 28T tokens and the Instruct model combines supervised fine tuning, priority alignment, and large-scale multi-stage reinforcement learning, increasing the quality of instruction adherence and tool usage beyond other 1B class baselines.
LFM2.5-1.2b-Instruct provides strong text benchmark performance on the 1B scale, reaching 38.89 on GPQA and 44.35 on MMLU Pro and leading peer models such as Llama 3.2 1B Instruct, Gemma 3 1B IT and Granite 4.0 1B on IFeval and IFBench.
The family includes specific multimodal and regional variants, with LFM2.5-1.2b-JP achieving state-of-the-art results for Japanese benchmarks at its scale and LFM2.5-VL-1.6b and LFM2.5-AUDIO-1.5b covering vision language and native audio language workloads for edge agents.

check it out technical details And model weightAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.

Check out our latest releases ai2025.devA 2025-focused analytics platform that models launches, benchmarks and transforms ecosystem activity into a structured dataset that you can filter, compare and export

Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

Liquid AI Releases LFM2.5: A Compact AI Model Family for Real on Device Agents

Architecture and training recipe

Display of text model on a billion scale

Japanese adapted version

Vision Language Model for Multimodal Edge Workloads

Audio language model with native speech generation

key takeaways

Inside Latent Space: The Hidden Intelligence of AI Systems

Shift towards AI data quality as core product

Related Articles

Leave a Comment Cancel Reply