Predictive optimization at scale: a year of innovation and what’s ahead

by February 19, 2026

by February 19, 2026 0 comments

Predictive optimization at scale: a year of innovation and what's ahead

A high-performing, cost-effective lakehouse is one that adapts to growing data volumes, shifting query patterns and changing organizational usage without constant manual tuning. Databricks’ Predictive Optimization (PO) in Unity Catalog is designed to provide that behavior: it continuously analyzes how data is written and queried, then automatically applies maintenance actions so that platform teams do not have to schedule and tune them by hand. Over the course of 2025, according to Databricks, Predictive Optimization shifted from an optional automation feature to default platform behavior for Unity Catalog managed tables.

How Predictive Optimization Works

Predictive Optimization decides when to run the core table-maintenance operations on managed tables: OPTIMIZE (compaction), which rewrites small files to improve data locality and read efficiency; VACUUM, which removes unreferenced files to control storage cost; automatic Liquid Clustering (CLUSTER BY), which selects and updates clustering columns; and ANALYZE, which keeps the statistics used for query planning and data skipping current. Databricks describes these decisions as workload-driven and adaptive, so schedules and parameters do not need to be re-managed when query patterns change.

Reported Advances in 2025

Databricks highlighted several improvements over the year. Automated Statistics, now generally available, uses observed query behavior to decide which columns matter and keeps their statistics fresh; the company reports this contributed to queries running roughly 22% faster in its measurements. VACUUM was re-engineered for greater efficiency, which Databricks describes as several times faster and meaningfully cheaper to run. Automatic Liquid Clustering added a workload-modeling step that tests candidate clustering keys — for example clustering on a date column, a customer ID, or both — and runs a cost-benefit analysis to pick the layout expected to prune the most data, including recognizing when a table’s existing insertion order is already good enough. Coverage also broadened beyond standard tables to additional Databricks surfaces, such as Lakeflow Spark Declarative Pipelines, bringing autonomous background maintenance to materialized views and streaming tables.

What Databricks Says Is Next in 2026

The published roadmap points to two notable additions. Auto-TTL (automatic row deletion) would let Predictive Optimization age out rows automatically based on a defined retention policy. Enhanced observability, delivered through a Data Governance Hub dashboard, is intended to make the system’s activity and return on investment visible out of the box — surfacing metrics such as compacted, clustered, vacuumed and analyzed bytes, estimated storage savings, and the reasons PO skipped an optimization (for instance, a table already well clustered or too small to benefit from compaction).

Limitations and What to Watch

The performance and cost figures here come from Databricks’ own analysis rather than independent benchmarks, and results will vary with data size, query mix and workload. The feature applies to Unity Catalog managed Delta tables; as of 2025, Automatic Liquid Clustering did not extend to every scenario, and coverage for surfaces such as streaming ingestion and materialized views has been rolling out in stages rather than being universal. Automating maintenance also shifts some control away from data teams, so organizations with strict cost governance or unusual layout requirements should review PO’s decisions through the observability tools rather than assume every default is optimal. Finally, roadmap items such as Auto-TTL are forward-looking and subject to change before general availability. The original announcement is on the Databricks blog, and full behavior is documented in the Predictive Optimization documentation. Teams modernizing their ingestion layer alongside these optimizations may also find this overview of Zerobus Ingest in Lakeflow Connect relevant.

Predictive optimization at scale: a year of innovation and what’s ahead

How Predictive Optimization Works

Reported Advances in 2025

What Databricks Says Is Next in 2026

Limitations and What to Watch

Mark Zuckerberg claims Meta tried to maximize time spent by teens on social media

Why is the FDA banning compound GLP-1 drugs for weight loss?

Related Articles