5 Time Series Foundation Models You’re Missing

by
0 comments
5 Time Series Foundation Models You're Missing


Image by author | diagram from Chronos-2: From Univariate to Universal Forecasting

# Introduction

Foundation models did not start with ChatGPT. Long before large language models became popular, pre-trained models were already making advances in computer vision and natural language processing, including image segmentation, classification, and text understanding.

The same approach is now reshaping time series forecasting. Instead of building and tuning a separate model for each dataset, the Time Series Foundation model is pre-trained on large and diverse collections of temporal data. They can provide robust zero-shot forecasting performance across domains, frequencies, and horizons, often matching deep learning models that require hours of training using only historical data as input.

If you are still relying primarily on classical statistical methods or single-dataset deep learning models, you may be missing out on a major shift in building predictive systems.

In this tutorial, we review five time series foundation models, chosen based on performance, popularity as measured by hugging face downloads, and utility in the real world.

# 1. Chronos-2

Chronos-2 is a 120M-parameter, encoder-only time series foundation model built for zero-shot forecasting. It supports univariate, multivariate, and covariate-informed forecasting in a single architecture and provides accurate multi-step probabilistic forecasting without task-specific training.

key features:

  1. Encoder-only architecture inspired by T5
  2. Zero-Shot Forecasting with Quantile Output
  3. Basic support for past and known future covariates
  4. Long reference length up to 8,192 and forecast horizon up to 1,024
  5. Efficient CPU and GPU inference with high throughput

Use cases:

  • Large scale forecasting across multiple related time series
  • Concurrent-driven forecasting such as demand, energy and pricing
  • Rapid prototyping and production deployment without model training

Best use cases:

  • production forecasting system
  • Research and benchmarking
  • Complex multivariate forecasting with covariates

# 2. Tyrex

TiRex xLSTM is a 35M-parameter pre-trained time series forecasting model, designed for zero-shot forecasting in both long and short horizons. It can generate accurate forecasts without any training on task-specific data and provides both point and probabilistic predictions out of the box.

key features:

  • Pre-trained xLSTM-based architecture
  • Zero-shot prediction without dataset-specific training
  • Point forecasting and quantity-based uncertainty estimation
  • Strong performance on both long and short horizon benchmarks
  • Optional CUDA acceleration for high-performance GPU inference

Use cases:

  • Zero-shot forecasting for new or unseen time series datasets
  • Long-term and short-term forecasting in finance, energy and operations
  • Fast benchmarking and deployment without model training

# 3. TimesFM

timesfm is a pre-trained time series foundation model developed by Google Research for zero-shot forecasting. Open Checkpoint TimesFM-2.0-500M is a decoder-only model designed for univariate forecasting, supporting long historical contexts and flexible forecast horizons without task-specific training.

key features:

  • Decoder-only foundation model with 500M-parameter checkpoint
  • Zero-shot univariate time series forecasting
  • Reference length up to 2,048 time points, with support beyond the training range
  • Flexible forecast horizon with optional frequency indicators
  • Optimized for fast point forecasting at scale

Use cases:

  • Large-scale one-way forecasting across diverse datasets
  • Long-term forecasting for operational and infrastructure data
  • Fast experimentation and benchmarking without model training

# 4. IBM Granite TTM R2

granite-timeseries-ttm-r2 TinyTimeMixers (TTM) are a family of compact, pre-trained time series foundation models developed by IBM Research under the TTM framework. Designed for multivariate forecasting, these models achieve robust zero-shot and few-shot performance despite model sizes as small as 1M parameters, making them suitable for both research and resource-constrained environments.

key features:

  • Small pre-trained models starting with 1M parameters
  • Strong zero-shot and few-shot multivariate forecasting performance
  • Focused models tailored to specific context and forecast length
  • Fast estimation and fine-tuning on a single GPU or CPU
  • Support for exogenous variables and static categorical features

Use cases:

  • Multivariate forecasting in low resource or edge environments
  • Zero-shot baseline with optional mild fine-tuning
  • Fast deployment for operational forecasting with limited data

# 5. Toto Open Base 1

toto-open-base-1.0 is a decoder-only time series foundation model designed for multivariate forecasting in observational and monitoring settings. It is optimized for high-dimensional, sparse, and non-stationary data and provides strong zero-shot performance on large-scale benchmarks such as GIFT-Eval and BOOM.

key features:

  • Decoder-only transformer for flexible reference and forecast length
  • Zero-shot forecasting without fine-tuning
  • Efficient management of high-dimensional multivariate data
  • Probabilistic forecasting using Student-t mixture model
  • Pre-trained on over two trillion time series data points

Use cases:

  • Observability and monitoring metrics forecast
  • High-dimensional systems and infrastructure telemetry
  • Zero-shot forecasting for large-scale, non-stationary time series.

Summary

The table below compares the main features of the time series foundation models discussed, focusing on model size, architecture and forecasting capabilities.

Sample parameters architecture forecast type major forces
Chronos-2 120m Encoder-only univariate, multivariate, probabilistic Strong zero-shot accuracy, long reference and horizon, high estimation throughput
TiRex 35M xLSTM-based indivisible, probable Lightweight model with strong short and long horizon performance
timesfm 500M decoder only Univariable, point forecast Handles long contexts and flexible horizons at scale
Granite TimeSeries TTM-R2 1M-Small Focused on pre-trained models Multivariate, point forecasting Extremely concise, fast estimation, robust zero- and few-shot results
toto open base 1 151m decoder only multivariate, probabilistic Optimized for high-dimensional, non-stationary observational data

abid ali awan (@1Abidaliyawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a master’s degree in technology management and a bachelor’s degree in telecommunication engineering. Their vision is to create AI products using graph neural networks for students struggling with mental illness.

Related Articles

Leave a Comment