Backstage with Lakebass, Part 2

by ai-intensify
0 comments
Backstage with Lakebass, Part 2

In Part – 1 In this series, we explore how Backstage’s underlying database is leading the way Databricks Lakebase Turned risky schema migrations into 1-second branch-and-test operations. But a fast developer cycle only gets you so far if security and administration teams are still treating your operational databases like black boxes.

In a traditional stack, your application database and your data lake live in two completely different security paradigms. The ownership graph for your infrastructure lives in the backstage, backed by a separate RDS instance and governed by complex IAM roles and Postgres native grants. Meanwhile, your warehouse data is controlled by the data team using Unity Catalog. unity list is an open source framework created by Databricks that provides a unified governance layer for data, AI, and now operational databases – a single place to manage access controls, audit trails, lineage, and compliance across everything on the platform.

To audit a single table drop on RDS, you need to cross-reference CloudTrail to the IAM principal, pg_stat_activity Or pgaudit Logs for SQL statements, and CloudWatch for timestamps, three services, three query languages, three access policies. The operational database becomes a compliance side-channel.

Unity Catalog absorbs OperationalDB

When we pointed backstage in Lakebase, we didn’t just change where the data was; We changed where the access policy was.

Because LakeBase is natively embedded inside Databricks, Unity Catalog extends directly onto the operational Postgres database. In this POC, we used Lakehouse Federation to display the backstage catalog as a foreign catalog (lakebase_bs) in the Unity Catalog. Once it’s there, standard UC grants control who can see what, no Postgres-level role management required:

Although we have not created end-to-end row-level security policies for Backstage in this POC, architecturally, the exact same RLS rules that protect sensitive billing tables can be applied directly to these operational tables. The wall between “operational” and “analytical” ceases to be a physical boundary, and simply becomes an access pattern.

An integrated audit trail out of the box

Do you remember the 1-second copy-on-write branching we implemented in Part 1? In a traditional setup, proving to a security engineer that a developer simply branched the database for an hour and then destroyed it is a manual exercise.

With Lakebase, every control-plane action is automatically recorded against the operational database system.access.audit. To prove this, we queried the audit logs for the exact branch operations from our Part 1 disaster-recovery experiment:

Result:

The creation and deletion of each branch from our Part 1 experiments is logged. Each event is tied to a unique OAuth user identity and source IP, captured automatically, and governed by the same row-level security controls as every other audit table in the Unity catalog. No CloudTrail cross-referencing. No RDS log parsing. An SQL query.

Automatic cost attribution by branch

A governance team just doesn’t want to know Who Created a branch, they want to know what is its price.

In traditional AWS environments, tracking the cost of a short-lived RDS instance requires custom CloudWatch tagging strategies that often miss short-lived workloads. Because LakeBase integrates natively with Unity Catalog’s system billing tables, calculated cost breaks are automatically broken down. project_id, branch_idAnd endpoint_id.

In this POC, the production branch was billed to 31.6130 DBUWhile the dropped test branch was attributed independently 0.0107 DBU. The audit trail and cost trail are controlled in exactly one place.

What this means for daily branching teams

Our governance story answers the compliance question: Can we prove who did what, when and what it cost? The answer is yes – one SQL query instead of three services. But there’s a second governance question that’s just as important for development teams adopting the branching workflow from Part 1: What happens to governance when your team creates dozens of branches per sprint?

In Part 1, we described a workflow where each feature branch and each pull request gets its own separate database copy. A team of six developers running a two-week sprint may create and destroy 30-40 branches in a single sprint. That’s 30-40 copies of production data, each containing potentially sensitive areas – customer PII, financial records, health data.

This is where Unity Catalog’s branch-level governance becomes not only convenient, but also burdensome. When a LakeBase branch is created, the Unity Catalog’s feature-level masking policies are automatically propagated to the new branch. A developer working on their feature branch never sees production data without hiding it – not because someone remembered to configure it, but because the administration layer enforces it at build time. The CI branch running your PR tests is controlled in the same way as production. The QA branch where a tester runs destructive scenarios is controlled in the same way as production. There is no “non-production exception” where sensitive data is leaked because someone forgot to enforce a policy.

This matters more than it seems. According to Perforce’s 2025 State of Data Compliance Report, 60% of organizations have experienced breaches or thefts in non-production environments where sensitive data was inadequately anonymized. The traditional approach – manually masking data when provisioning dev/test environments – does not scale when environments are created and destroyed in seconds. Governance must be automated, otherwise it does not happen.

new opportunity for dba

Audit trails and cost attribution data also indicate a quiet shift: the role of the DBA is evolving from reactive ticket work to strategic platform architecture.

Today, most of a DBA’s time goes on operational requests – environment provisioning, schema reviews, data refreshes, access grants. A six-developer team can generate 30+ tickets per sprint, and the DBA’s calendar becomes a queue. The expertise that makes DBAs valuable – understanding data integrity, performance, and governance at a deep level – gets buried under repetitive provisioning work.

When branching is self-service and governance is automated, that repetitive work is eliminated. Developers provision their own environments in a second. Schema changes in pull requests are reviewed asynchronously – the DBA sees a formatted schema difference posted by CI, reviews it on his own schedule, and approves or requests changes through the normal PR workflow. With the time now available, those reviews go deeper: the DBA helps team members understand existing data and structures in production, work with them to arrive at better solutions, and conduct deeper reviews that maintain data integrity and governance standards. Data masking is enforced by policy, not by manual intervention. Costing is automated, not a monthly reconciliation exercise.

The work that opens up really takes advantage of the DBA’s expertise: defining branch policies, designing governance rules, crafting promotion workflows, tuning performance, and setting up guardrails that make self-service secure. The DBA moves from doing the work to designing how the work gets done – from 30+ operational tickets per sprint to less than 5 high-value policy reviews. The audit trail displayed above is not just a compliance artifact – it is the DBA’s new strategic dashboard, a real-time view of how the platform is being used and where to make the next investment.

From roll shift to tooling

The DBA’s pivot from operational tickets to platform design only works when the tooling changes with the role. The platform itself has to perform routine tasks, and the DBA needs space to do it design How is that work accomplished?

The two open-source tools, both deployed as Databricks apps and both governed by the same Unity Catalog grant and audit trail described above, close that loop.

lakebaseops This is what the platform itself does. Three agents – Provisioning, Performance and Health – replace the 51 tasks for which DBAs used to file tickets. Seven of them run as scheduled Databricks jobs and pg_cron replaces the crontab a DBA would otherwise maintain by hand. A monitoring UI displays live PG_State metrics, slow-query regression, branch TTL enforcement, and a 9-KPI adoption dashboard. Ten source engines (Aurora, RDS, Cloud SQL, AlloyDB, Cosmos DB, and more) score against Lakebase, along with a migration wizard and live pricing from AWS and Azure APIs.

Lakebase MCP This is what the DBA does on top of the platform. A Model Context Protocol Server that exposes 46 tools for any MCP-enabled AI agent (Cloud, Copilot, GPT). The DBA stops opening pgAdmin and starts describing the intent:

Two design options keep it safe. First, dual-layer governance: a SQL-statement guard and a per-tool access guard, with four pre-built profiles (read_only, analyst, developer, admin) that map to the same UC access patterns shown above. Runs as a coding assistant read_only And no one can physically drop a table.

Second, each query is responsive – the server tags each statement with native tools:

Combined with the branch-level cost attribution shown earlier, you can answer “Which agent on which branch generated the CPU spike at 4 am?” In a SQL query.

lakebaseops runs For Team. lakebase mcp runs with Team. Both have inherited the ruling posture you just saw.

In Part 3 of this series, we’ll look at the ultimate benefit: taking proprietary data from the infrastructure inside Backstage and connecting it directly to cloud billing data in a single SQL query.

Related Articles

Leave a Comment