How Databricks helps baseball teams get an edge with data and AI

Baseball moves fast, defined by small moments: a pitch, a matchup, a decision. This story tells how a modern clubhouse uses Databricks to transform high-fidelity pitch data into decisions that help win games.

Game Day, 2:00 PM

Hitter’s meeting with Genie and Unity Catalog

Hitters file into the video room. The coach doesn’t want a 30-page printout; They want a clear plan for tonight’s start.

Earlier that day, the analyst sat down at his laptop and opened the laptop genieon top of unity listWhere Statcast and team-derived tables reside with consistent schema, permissions, and lineage. He asked:

“For tonight’s start, show the first-pitch mix and location of our right-handed hitters and left-handed hitters over the past two seasons. Highlight trends when runners are on base.”

Genie compiled answers from governed delta table In unity list. As part of that work, the analyst also registered a set Unity Catalog SQL Function Which covers key questions, such as count, arm, and base-runner status tendencies, so that they can reuse them in future planning and automated agents.

The analyst exported the results to a simple one pager that employees could print or include in hitters’ binders. The main points were:

RIGHTS: High cutter and four-seamer early, especially with empty bases.
Lefties: More changeups and sinkers when there is a runner on second.
Two Strikes: Slider Down and Away Appears in most major punch-outs.

The hitting coach comes into the meeting with three clear things. By the time players move on to batting practice, the first two trips through the order are not predictable; They’re based on a shared vision of how tonight’s starter actually pitches.

Pre-Series Bullpen Preparation

Scripting pitching changes with agent frameworks and model serving

The staff knows there will be a point in most games when the starter is approaching 100 pitches and the heart of the order is coming up. Choosing between a sinkerballer and a slider-first righty will feel like a tough call at this point, but work comes first.

In the clubhouse before the series, the analyst uses a multi-agent supervisormade with agent bricks and posted on model servingTo simulate the pockets the staff cares about: the heart of the order in the sixth, the bottom third in the seventh, the lefty-heavy cluster in the late innings.

For each decision, the agent:

Unity resolves the names of relevant hitters into IDs using the lookup function in the catalog.
Call UC SQL Function Which calculates pitch-type and location outcomes based on count, hand, and base-runner status.
Each compares a reliever’s arsenal to a hitter’s pocket and explains in plain baseball language which profile plays best and why.

The analyst turns it into a short bullpen card. For example:

“If it’s three hitters coming in and the starter is tiring, the slider-first righty is favored; here’s how his mix has played out in similar pockets.”
“If the lower third is outstanding, the sinkerballer’s ground-ball profile wins more often; here’s the proof.”

Employees print the cards and review them together. When the actual sixth inning situation plays out during the game, no one is logging into Databricks. The pitching coach is following a decision tree that the staff has pressure tested with the agent hours in advance.

late start crime

Pinch-hit decision planning with the same agents and tools

Pinch-hit options in the eighth inning are practiced the same way.

As part of pre-game preparation, the analyst asks the Databricks agent:

“For the potential late-innings relievers we’ll look at in this series, rank our bench batsmen based on expected results, and explain when each is a better option.”

The agent calls the same UC functions and delta tables in the Unity catalog:

Combine each reliever’s usage patterns with each bench hitter’s results based on pitch type, location, and count.
Simulate possible late game scenarios, such as runners on first and second, facing an out, right-handed reliever who is relying on a cutter.
Produce straightforward guidance, such as: “Against reliever

The analyst puts these recommendations on a manager’s game card or a small one-page “pinch-hit grid” that can be reviewed in advance. Once the game begins, the card becomes the reference point. The manager is choosing between options they’ve already run through, with the data distilled into a format that respects league rules about equipment in the dugout.

day of travel

Advanced Scouting with Vector Search and Unity Catalog

On days off between series, the analyst turns from single-game strategy to what’s going to happen next. The two upcoming starters have limited head-to-head history against the lineup.

Back in Genie, he asks:

“Find pitchers whose arsenal and movement profiles are similar to our upcoming starters, then show how our lineups have performed against those comparable arms.”

Here, the genie assigns some part of the work databricks vector search. Pre-processing indexes the pitcher and hitter embeddings stored in the Unity catalog so that the system can find “similar pitchers” without guessing by eye.

The workflow is:

Genie analyzes the pitch mix and movement of new starters from the Unity catalog table.
Vector search finds pitchers with similar pitch profiles.
UC SQL functions calculate lineup results compared to those of comparable pitchers.
Genie summarizes the patterns into a scouting report that a hitting coach can use.

When head-to-head Statcast history is thin, this combination of vector search and Genie gives the staff a way to say, “Here’s how we have hitting pitchers that look like this,” and incorporate it into series planning. Those insights are then exported into advance reports, ready for the next road meeting.

front office day

GM and Analyst with Genie, Lakehouse and Lakebase

Winning seasons are based on more than one sport. GMs and analysts use the same platform to make decisions about value, suitability, and risk.

In Genie, they explore questions such as:

“Count and hand-show how our number three starter’s profile plays out against the top lineups in our division. Where does his value come from, and where are we exposed?”

“For left-handers in the league, identify players whose strength matches where our division stands in the final innings.”

These questions are answered directly Lakehouse In unity list. Pitch-level data, embeddings, and derived features are all controlled in one place. Genie turns them into natural-language answers, but the logic under the hood is still reusable UC SQL functions.

Meanwhile, the Baseball Operations app that is used by coaches, scouts and the front office is supported lakebase postgres. That app is where:

Scouts file reports on potential trade targets.
After the game the coaches tag high-level decisions, such as “slider went first in order in the sixth vs. Heart.”
The GM records the final call on trades, extensions and roster moves.

Because Lakebase is part of the Postgres Databricks platform, app state is kept close to the source data:

App rights (reports, tags, decisions) go into Lakebase Postgres and are immediately available to analysts and agents who have access.
Scheduled jobs or pipelines publish curated slices of Unity Catalog tables to Lakebase Postgres, so the app UI always has the latest statistics and features without manual CSV exports.

The result is shared memory. What happened, why it happened and how it was justified is all stored in one place along with timestamps and user identification.

why wins this game

Better roster moves: Player moves are more in line with the way the league is pitching, especially in the division and in October.
High-quality plate appearances: Hitters look at what a pitcher actually throws, not what he normally throws.
Cleaner Bullpen Matchups: Each reliever’s best situations become clear in seconds, reducing guesswork under clock pressure.
Fewer wasted pitches at leverage: Knowing put-away pitches by the hitter and the count reduces deep counts and free passes.
Better first-pitch results: Attack schemes that reverse expected options create quick contact on the team’s terms.

All this only matters if the numbers are right. By running these agents and apps on top of a single governed lakehouse rather than scattered one-off tools, clubs can see that the logic matches what they already do and leverage it across larger venues. When the data points toward a specific matchup or move, it feels like an extension of the game plan, not a black box.

Learn more about Databricks Sports, or request a demo to see how your organization can provide competitive insights.