Google launches TensorFlow 2.21 and LiteRT: faster GPU performance, new NPU acceleration, and seamless PyTorch Edge deployment upgrades

Google has officially released TensorFlow 2.21. The most significant update in this release is the graduation of LightRT from its preview phase to a fully production-ready stack. Moving forward, LiteRT serves as the universal on-device inference framework, officially replacing TensorFlow Lite (TFLite).

This update streamlines the deployment of machine learning models on mobile and edge devices while expanding hardware and framework compatibility.

LiteRT: performance and hardware acceleration

When deploying models on edge devices (such as smartphones or IoT hardware), inference speed and battery efficiency are the primary constraints. LiteRT addresses this with updated hardware acceleration:

GPU Improvements: LiteRT delivers 1.4x faster GPU performance Compared to previous TFLite framework.
NPU Integration: This release introduces cutting-edge NPU acceleration with unified, streamlined workflows for both GPUs and NPUs on all edge platforms.

This infrastructure is specifically designed to support cross-platform GenAI deployment for open models like Gemma.

Low precision operations (quantization)

To run complex models on devices with limited memory, developers use a technique called quantization. This involves reducing the precision – the number of bits – used to store the weights and activations of the neural network.

TensorFlow 2.21 expands significantly tf.lite Operators support for low-precision data types to improve efficiency:

SQRT operator now supports int8 And int16x8.
comparison operator support now int16x8.
tfl.cast Now supports included conversions INT2 And INT4.
tfl.slice Support has been added for INT4.
tfl.fully_connected Support included for now INT2.

Extended Framework Support

Historically, it can be difficult to convert models from different training frameworks into a mobile-friendly format. LiteRT makes it simple by offering First-class PyTorch and JAX support through seamless model transformation.

Developers can now train their models in PyTorch or JAX and convert them directly for on-device deployment without the need to rewrite the architecture in TensorFlow first.

Maintenance, safety and ecosystem focus

Google is shifting its TensorFlow core resources to focus more on long-term sustainability. The development team will now focus specifically on:

Security and bug fixes: Quickly address security vulnerabilities and critical bugs by releasing smaller and patched versions as needed.
Dependency Update: Releasing minor versions to support updates to underlying dependencies, including new Python releases.
Community Contribution: We will continue to review and accept important bug fixes from the open-source community.

These commitments apply to the broader enterprise ecosystem, including: tf.data, Tensorflow Serving, Tfx, Tensorflow Data Validation, Tensorflow Transform, Tensorflow Model Analysis, Tensorflow Recommender, Tensorflow Text, Tensorflow, and Tensorflow Quantum.

key takeaways

LiteRT officially replaces TFLite: LiteRT has moved from preview to full production, and has officially become Google’s primary on-device inference framework for deploying machine learning models in mobile and edge environments.
Major GPU and NPU acceleration: The updated runtime delivers 1.4x faster GPU performance than TFLite and introduces an integrated workflow for NPU (Neural Processing Unit) acceleration, making it easier to run heavy GenAI workloads (like Gemma) on specialized edge hardware.
Aggressive Model Quantization (INT4/INT2): To maximize memory efficiency on edge devices, tf.lite Operators have expanded support for extremely low-precision data types. This also includes int8/int16 For SQRT and comparison operations, along with INT4 And INT2 support for cast, sliceAnd fully_connected Director.
Seamless PyTorch and JAX interoperability: Developers are no longer tied down in training with TensorFlow for edge deployment. LiteRT now provides first-class, native model conversion for both PyTorch and JAX, streamlining the pipeline from research to production.

check it out technical details And repo. Also, feel free to follow us Twitter And don’t forget to join us 120k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Michael Sutter is a data science professional and holds a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michael excels in transforming complex datasets into actionable insights.

Google launches TensorFlow 2.21 and LiteRT: faster GPU performance, new NPU acceleration, and seamless PyTorch Edge deployment upgrades

LiteRT: performance and hardware acceleration

Low precision operations (quantization)

Extended Framework Support

Maintenance, safety and ecosystem focus

key takeaways

Ben Affleck sells his AI postproduction startup to Netflix AI (Artificial Intelligence)

Why could it take months to replace Anthropic at the Pentagon?

Related Articles

Leave a Comment Cancel Reply