From Tribal Knowledge to Quick Answers: Building Refi on Databricks

Finding the right customer story at the right moment is harder than it sounds. To address that, Databricks built Refi, an internal application that lets employees search and analyze a large library of customer references and receive tailored answers, cross-story analysis, and ready-to-use quotes. According to Databricks, in its first two months more than 1,800 people across sales and marketing asked over 7,500 questions of the tool, drawing on a corpus of more than 2,400 customer references. The stated result was more relevant and consistent storytelling, faster campaign execution, and greater confidence that customer testimonials were being used at scale.

The challenge: democratizing tribal knowledge

When the best customer evidence lives only in a few people’s heads, predictable problems follow. A handful of high-value references get overused, promising new use cases or industries are overlooked, and marketing effectiveness ends up limited by who happens to know what. The raw material is abundant — conference talks, case studies, internal slides, articles, and posts spread across many platforms — but with no unified search, locating the right example is the point where the process breaks down.

Even when a relevant story is found, its strength is not obvious: it may contain solid business outcomes or only vague claims. In practice, people fall back on messaging colleagues, digging through half-remembered folders, or asking around until something useful surfaces. Sometimes that turns up exactly the right reference; often it produces only “good enough.” Refi was designed to replace that informal scramble with fast, consistent answers.

A full-stack solution on Databricks

Refi was built end-to-end on the Databricks platform, following a familiar medallion data architecture. Raw references are ingested and then enriched using built-in AI functions: a large language model (Google’s Gemini 2.5) assesses each story’s quality by identifying the business challenge, the solution, the credibility of the outcome, and how Databricks was positioned to deliver value. The same step extracts metadata such as country, industry, products used, competitors, and citations, and tags each story as publicly shareable or internal-only. This enriched data is stored in a “silver” table governed by Unity Catalog.

Later ETL stages filter out low-scoring stories and build a consolidated “summary” column capturing each story’s essential components. That curated “gold” table is then synchronised to a Databricks Vector Search index, so the summaries contain everything a language model needs to match customer stories to a query. On top of this, a tool-calling agent — defined using the DSPy framework — retrieves the most relevant stories and composes answers, all coordinated through a single SDK.

Because Refi is primarily a chat application, conversation history, logs, and user identities are retained in Lakebase for fast reads and writes, quality assurance, and continuity as users return or start new sessions. Logs are processed in a separate Lakeflow job that surfaces metrics such as daily active users and average response time in an AI/BI dashboard. A further AI function summarises recent questions and reactions to highlight which stories are popular and where gaps exist — for example, strong demand for references involving newer products such as Agent Bricks and Lakebase.

What the team learned

A recurring lesson from the project is that the value came less from any single model than from the surrounding data discipline: consistent quality scoring, clean metadata, and governance made the difference between a novelty and a tool people trusted. Filtering out weak stories before they reached users was as important as surfacing strong ones, because a search tool that returns unreliable evidence quickly loses credibility. Continuous monitoring of real questions also turned the application into a feedback loop, revealing demand the content library had not yet met.

Limitations and what to watch

The figures cited here are Databricks’ own internal results and describe a single deployment inside the company that builds the platform, so they should be read as illustrative rather than independently benchmarked. Approaches built on retrieval and large language models still depend heavily on the quality and freshness of the underlying content; stale or thin source material limits what any agent can return, and automated quality scoring can itself be wrong. Several components referenced — including Agent Bricks, Lakebase, and the specific model version — are evolving products, so an equivalent build would need to be checked against current documentation. The transferable idea is the pattern: enrich and score content, govern it centrally, retrieve it with an agent, and monitor real usage to keep improving it.

The original account is published on the Databricks blog, and the retrieval framework is documented in the DSPy project documentation. For related reading on this site, see coverage of activating first-party data on Databricks.

From Tribal Knowledge to Quick Answers: Building Refi on Databricks

The challenge: democratizing tribal knowledge

A full-stack solution on Databricks

What the team learned

Limitations and what to watch

Hyundai commits $6.1 billion for AI, robotics hub in Korea

Activate First-Party Data with the Meta Conversion API on Databricks

Related Articles