Securing specific compute capacity can be challenging, especially during high-traffic (and high-pressure) periods. Data engineers and platform administrators are all too familiar with the frustration of insufficient capacity, or “stockout,” errors that occur when a cluster launch fails because the cloud provider cannot satisfy a request for a specific instance type.
Be it:
AWS_INSUFFICIENT_INSTANCE_CAPACITY_FAILURECLOUD_PROVIDER_RESOURCE_STOCKOUTon Azure, orGCP_INSUFFICIENT_CAPACITY,
These errors disrupt critical workloads, especially during business-critical periods when uptime matters most.
What are flexible node types?
Traditionally, Databricks clusters require each node to have the exact instance type specified in your configuration. If that specific type was missing, the cluster launch would fail.
Flexible node types overcome this limitation. When a preferred instance type is not available, Databricks automatically falls back to a compatible alternative that shares the same compute size. In other words, the cluster launches successfully using a mix of similar instance types rather than failing completely.
Teams that need tighter controls can also define a custom fallback list via the API, including which types of instances to try and in what order.
main benefits
Fewer failed cluster launches during peak demand
Resilient node types reduce both the frequency and severity of capacity-related failures. When a cloud provider cannot meet the preferred instance type, Databricks automatically falls back to compatible options, allowing the cluster to launch instead of failing.
optimized spot instance usage
For clusters configured with spot-with-fallback, elastic node types try to get spot capacity to the full fallback list before returning back to the on-demand instances. This increases the share of the cluster running on spot, helping to reduce compute costs while prioritizing successful launches.
Clear visibility and precise control
Teams can observe exactly which node types have been achieved using the node_timeline system table. Additionally, a custom fallback order can be defined via the API, allowing precise control over cost and performance behavior.
quick start
Workspace administrators can easily enable the feature in administrator settings (document: AWS, Blue, gcp). From there, the feature is applied immediately to all new cluster launches. Long-running clusters will adopt this feature on their next restart, and future job clusters created for existing jobs will automatically use this feature.
Custom fallback lists can be configured via the API, independent of workspace settings.
Additional Details
For more information on configuring flexible node types with instance pools, billing, node type quotas, and selective enablement/disability, please see the documentation (docs: AWS, Blue, gcp).
Flexible node types are designed to make your data platform more flexible and cost-effective. Administrators can enable this feature with 1-click today in Workspace Admin Settings by following the instructions in the documentation.
