Black Forest Labs has released FLUX.2 (Klein), a compact image model family targeting interactive visual intelligence on consumer hardware. Flux.2 (klein) extends the Flux.2 line with a second generation and editing, a unified architecture for text-to-image and image-to-image, and deployment options ranging from local GPU to cloud API while maintaining state-of-the-art image quality.
From FLUX.2 (dev) to Interactive Visual Intelligence
FLUX.2(dev) is a 32 billion parameter rectified flow transformer for text conditioned image generation and editing, incorporating structure with multiple reference images, and runs primarily on data center class accelerators. It is designed for maximum quality and flexibility with long sampling schedules and high VRAM requirements.
FLUX.2 (Klein) takes the same design direction and compresses it into smaller rectified flow transformers with 4 billion and 9 billion parameters. These models are distilled to very short sampling schedules, support the same text for image and multi-context editing tasks, and are optimized for less than 1 second response time on modern GPUs.
Ideal Family and Capabilities
The FLUX.2 (Klein) family consists of 4 main open weight variants via the same architecture.
- Flux.2 (Clean) 4B
- Flux.2 (Clean) 9B
- FLUX.2 (Clean) 4B Base
- Flux.2 (Clean) 9B Base
FLUX.2 (Klein) 4B and 9B are step distilled and guidance distilled models. They use 4 estimation steps and are positioned as the fastest option for production and interactive workloads. FLUX.2 (klein) 9B combines a 9B flow model with an 8B Qwen3 text embedder and is described as the leading small model on the Pareto frontier for quality versus latency in text to image, single context editing and multi context generation.
Base variants are undistilled versions with longer sampling schedules. The documentation lists them as basic models that preserve the entire training signal and provide high output diversity. They are intended for fine tuning, LoRA training, research pipelines, and custom post training workflows where control is more important than minimum latency.
All FLUX.2 (Klein) models support three main functions in a single architecture. They can generate images from text, they can edit a single input image, and they can perform multi context generation and editing where multiple input images and a prompt jointly define the target output.
Latency, VRAM, and scaled variants
The FLUX.2 (klein) model page provides estimated end-to-end estimation times on the GB200 and RTX 5090. FLUX.2 (klein) 4B is the fastest version and is listed at about 0.3 to 1.2 seconds per image depending on the hardware. FLUX.2 (Klein) 9B Aim for about 0.5 to 2 seconds at high quality. Base models require several seconds as they run with a 50 step sampling schedule, but they expose more flexibility for custom pipelines.
The FLUX.2 (klein) 4B model card states that 4B fits approximately 13 GB of VRAM and is suitable for GPUs like the RTX 3090 and RTX 4070. The FLUX.2 (klein) 9B card is reported to require around 29 GB of VRAM and targets hardware like the RTX 4090. This means that a single high-end consumer card can host distilled variants with full resolution. Sampling.
To expand access to more devices, Black Forest Labs also releases FP8 and NVFP4 versions for all FLUX.2 (Klein) variants, developed in collaboration with NVIDIA. FP8 quantization is said to be 1.6 times faster with 40 percent less VRAM usage, and nVFP4 is said to be 2.7 times faster with 55 percent less VRAM usage on RTX GPUs, while keeping core capabilities the same.
Benchmark against other image models
Black Forest Labs evaluates FLUX.2 (Klein) via Elo style comparison on text to image, single context editing, and multi context tasks. Performance charts showing FLUX.2(Klein) on Elo score vs latency and Elo score vs Pareto frontier of VRAM. The commentary states that FLUX.2 (Klein) matches or exceeds the quality of the Queue-based image model at a fraction of the latency and VRAM, and that it outperforms Z Image while supporting integrated text-to-image and multi-context editing in a single architecture.

Base variants trade some speed for full customization and fine tuning, which aligns with their role as foundation checkpoints for new research and domain specific pipelines.
key takeaways
- FLUX.2 (Klein) is a compact rectified flow transformer family with 4B and 9B variants that supports text to image, single image editing and multi reference generation in a unified architecture.
- The distilled FLUX.2 (Klein) 4B and 9B models use 4 sampling stages and are optimized for sub second inference on a modern GPU, while the undistilled base models use longer schedules and are intended for fine tuning and research.
- Quantized FP8 and NVFP4 variants, built with NVIDIA, deliver up to 1.6x speedup with approximately 40 percent VRAM reduction for FP8 and up to 2.7x speedup with approximately 55 percent VRAM reduction for NVFP4 on RTX GPUs.
check it out technical details, repo And model weight. Also, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

