Data quality is AI strategy

In healthcare, data quality is increasingly treated as the real AI strategy. Few industries generate as much data as health systems, and few have as much to gain from turning that data into better care, faster research and more efficient operations. Yet in most health systems a large gap separates the data being generated from data that is actually usable. The organisations closing that gap tend to start with the data itself rather than with models.

NYU Langone Health, an academic health system serving the greater New York area through patient care, research and medical education, is one example. The institution runs an integrated data and AI platform on Databricks, having retired its on-premises data lake and begun moving to an enterprise data warehouse. A broad internal community of physicians, analysts, scientists and operational staff uses the platform across care delivery, operations and research.

Much of this direction is credited to Nader Mherabi, executive vice president, vice dean and chief digital and information officer at NYU Langone Health, who has guided the institution’s data strategy since well before the current wave of generative AI.

The clean-water principle: fix data at the source

A guiding idea Mherabi has described, established roughly a decade before the recent AI surge, is that high-quality data in the analytical layer depends on getting data right in the transactional systems first. He compares it to water in pipes: clean water at the source removes the need for expensive filtering at the end, whereas filtering dirty water downstream is costly and never fully effective. Some cleanup along the way is unavoidable, but the priority is correctness at the point of entry.

This discipline addresses a problem familiar to many large organisations. NYU Langone previously held patient data scattered across multiple systems without unified identifiers, which constrained what could be done with it. Integrating that data and aligning identifiers, by this account, is what makes metrics consistent: when data is integrated, different parts of the business stop arriving with numbers that do not reconcile.

Why data quality is the AI strategy

The stakes rise with AI. Model performance depends directly on the quality of the underlying data; weak data produces weak AI regardless of the sophistication of the model. A further dimension is timing — delivering the right information to the right people at the right moment, which places real demands on the data platform.

Integrated governance as a strategic imperative

Once data is integrated, the next challenge is making it searchable and reliable at scale. Mherabi frames governance as fundamental rather than optional, describing the use of a catalog — in NYU Langone’s case Databricks Unity Catalog — to manage data and AI models. He stresses that the investment is less about the tool than the strategy around it: defining master data sources, deciding who owns each part of the catalog, and considering carefully how data is exposed so that the wider community can find what it needs without duplicating work.

Real-time clinical decision support

In care delivery, the described impact is concrete. NYU Langone runs models in the emergency department that look for certain serious conditions and offer decision support to physicians, for example flagging whether a particular diagnosis has been considered before a patient is discharged. The stated intent is to augment clinical judgment rather than replace it — prompting a second look rather than issuing a verdict. Such models depend on real-time data feeds, which in turn require a platform capable of supporting continuous, low-latency operation.

Practical advice for other organisations

Two themes recur in Mherabi’s guidance. The first is to accept constant change: equipment and technology will keep evolving, so the more productive response is to choose partners whose platforms can grow alongside the organisation and to focus on value creation — safer, higher-quality care, better operational efficiency and a better patient experience. The second is education: much hesitation around AI stems from not understanding it, and staying informed supports better decisions as the market moves quickly.

Limitations and what to watch

This is a single-institution account drawn from a vendor-published discussion, so the experiences described should be read as illustrative rather than as independently audited results. The specifics reflect NYU Langone’s resources, scale and decade-long head start, and may not transfer directly to smaller organisations with different constraints.

Clinical AI carries particular caveats. Decision-support models that operate on real-time patient data require rigorous validation, ongoing monitoring for drift, and clear regulatory and safety oversight; a model that flags conditions accurately in one setting can behave differently elsewhere. Real-time data infrastructure is also genuinely hard to build and maintain, and data governance is a continuing programme rather than a one-off project. The broader, well-supported takeaway is simply that AI outcomes are bounded by data quality — a principle that holds across sectors, as seen in other large data-platform migrations such as Deutsche Börse’s notebook migration to Databricks.

The original discussion is published on the Databricks blog, and background on the executive quoted is available via NYU Langone Health.