Healthcare could be one of the biggest beneficiaries of AI. Few industries generate so much data, and few have as much leverage to draw insights from it. But in most health systems the gap between generating data and actually using it to improve care, accelerate research, and run operations more efficiently is huge. People closing that gap are starting with data, not models.
NYU Langone Health, a leading academic health system, serves the greater New York area through patient care, medical research and medical education. NYU Langone uses Databricks for its integrated data and AI platform, having recently closed down its on-premises data lake and now transitioning to its enterprise data warehouse. The organization has built a broad community of physicians, analysts, scientists, and corporate workforce members using the platform in care delivery, operations, and research.
Nader Mherabi, chief digital and information officer at NYU Langone Health, has led the institution’s data strategy even before the current wave of AI, building the foundation for a data-driven health system. In 2017, he recognized the quality of NYU Langone’s data collection and the opportunity to move forward with emerging AI capabilities.
Nader returned the metaphor: If you want clean water, fix the pipes. Don’t try to filter it in the end.
Fix the quality of your data at the source
Ellie McGue: NYU Langone is a metrics-driven organization with a mature data stack. When you already have a functional warehouse and data lake, what is the ‘missing piece’ that makes the move to a modern data platform necessary?
Nader Mherabi: Our path was a little different from some institutions. We have always been a highly data-driven, metrics-driven organization. We already had data integrated into data lakes and enterprise data warehouses, even in the traditional stack. Therefore, it was easier for us to onboard modern platforms than others.
But the imperative was clear. In 2017, we recognized that the potential of AI, even at this early stage, meant we needed to modernize our data stack. Building a model is one thing. running them 24/7 in a secure, reliable manner is another. We needed a platform that could help us realize our ambitions around patient quality, safety, efficiency and medical research, and that could grow with us as the technology evolved.
A guiding principle we established a decade ago is that if you really want high quality data in your intelligence layer, you have to get it right into transactional systems first. It is like water coming through pipes. If you have clean water at the source, you don’t have to filter it at the end. Filtering dirty water is expensive. Therefore, the goal should always be clean water first. There are some things you’ll still have to filter out along the way, but the principle should be to get it right to the top.
Eli: How has the discipline of fixing data at the transaction level changed the actual usefulness of your data level?
Nader: Years ago, we had multiple systems with patient data scattered across multiple locations without unified identifiers. This poses a major challenge to data quality, and limits what you can do with it. Part of our approach was to invest in common transaction platforms: an electronic health record and an ERP system. As we brought in new practices or hospitals, we invested in bringing everyone on common platforms and then created guiding principles for the data.
For example, we would never map data to a data warehouse layer. We always try to fix it at the source. We’ve mastered the systems and the data, so we know it’s the authoritative source for patient data, it’s the source for financial data, it’s the source for operational data. Once you do this, your data platform becomes more meaningful. People can crosswalk data, which is important in health care. Take a patient to the center: You need to connect their care data, from clinical trials to the financial side, to samples collected during surgery and where they physically sit. If you don’t have that mapping, you are missing out on a huge potential. The guiding principle that makes this possible is always the same: Get it right upstream.
What integrated data really unlocks
Eli: In healthcare, the risks to data accuracy are very high. How does a unified data foundation prevent ‘conflicting metrics’ debates between different departments, and why is this trust so important when moving towards agentic AI systems?
Nader: It is very big. Even before AI, the benefits from integrated data were immense. When your data is integrated, you can create better metrics, and different sides of the business don’t come in saying, “That number doesn’t make sense.” If your data is not integrated, your metrics will never line up.
With AI, of course, the stakes are raised. If you don’t have great data, you won’t have great AI. Performance depends on data quality. And then there’s the real-time dimension. What matters is getting information to people at the right time and right place.
Integrated governance is a strategic AI imperative
Ellie McGue: Once you have integrated data, the next challenge is to make it searchable and reliable at scale. How does data governance fit into this?
Nader Mherabi: This is fundamental. You need a catalog to work on data and AI models. We use the Unity Catalog, and we’re continuing to enhance it.
But the investment isn’t just in the equipment, it’s in the strategy around it. You need to define your master data sources, decide who owns each part of the catalogue, and then carefully consider how you expose it to the wider community so people can find what they need without duplicating work. It’s one thing to have a huge data program. This is actually another way for people to find the right data within it. If you are adopting a platform like this, I would always suggest getting a catalog right from the start. It underpins everything else.
Building a data-literate community
Ellie McGue: An integrated platform only provides value when people across the organization actually use it. How have you thought about building that community beyond the data engineering team?
Nadar: When you invest in such platforms, you have to optimize the investment. For us, it means promoting what it can do in the institution. The goal is to become a learning health system that learns from each patient interaction and brings that insight back into practice. This only works if the community using the platform extends far beyond IT. We have built a broad user base of physicians, analysts, and scientists, all working within appropriate access controls, and we have invested in literacy programs and training to ensure that those involved in care delivery, operations, and research can take advantage of the benefits. It is sure to bring IT to the stage. The real measure of success is whether other organizations can use it too.
Real-time insight where it matters most
Eli: In a high-intensity environment such as an emergency room, ‘information from the day before’ is effectively useless. What are the architectural requirements to move from retrospective reporting to real-time clinical decision support for a platform that can truly prevent misdiagnosis?
Nader: In care delivery, the impact is straightforward. We have models that walk into the emergency room that look for some of the more serious conditions and provide decision support to physicians. The goal is to ensure that if a patient is being discharged, the system can flag: Have you identified this diagnosis? Have you noticed this? Because we don’t want any patient to leave the emergency room in such a condition that if he or she misses it, it could have serious consequences.
We all hear about cases in other institutions where misdiagnosis leads to a bad outcome. We want real-time models that run continuously and provide the best advice to physicians. Not substituting their judgment, but saying, “Hey, you may have overlooked this. Please check again.” For them to work, models need real-time data. And this requires data platforms to support real-time feeds so that models can operate on current information and provide information at the right time.
Three Layers of Data Analysis
Eli: How has AI changed your organization’s approach to analytics and BI strategy?
Nader: I believe there are three levels of analytics. First, you need to provide some basic visualization. You can’t just say, “What do you want to see?” People need some structured starting points. Second, you add a layer of conversation, tools like Genie, where people can be curious and ask deeper questions. And third, you need to be able to give the answer in different forms depending on the user: sometimes it’s a direct fact, sometimes it’s a visual, and sometimes it’s some numbers on the screen.
The powerful thing about where we are now is that for the first time in human-machine history, we can actually talk to machines in human terms, the way you might ask a co-worker. It clearly has a place. But I would advise everyone to think about where and to what extent it makes sense. Don’t change your visualization completely. Add an interactive layer so people can get curious, ask more questions, and help themselves in a simple way.
Ali: The pace of AI development may be paralyzing for many leaders. How do you balance the need for a stable long-term strategy with the reality that technology may look completely different six months from now?
Nader: First, accept the unpredictability of AI. You will wake up tomorrow, and something new will have arrived. Equipment and technology will continue to change. Don’t dwell on it. Find good partners who can grow their platform as part of the transformation and focus on value creation.
Whether you’re providing safe, high-quality care, improving operational efficiency, or improving the patient experience, that’s what value is all about. Grow with the capabilities you have today and then continue to grow. And the second part is to educate yourself. One thing that makes people hesitant is that they don’t feel like they understand what’s going on. You need to stay as informed as possible as this helps you make better decisions as the market evolves, especially at the pace it is moving at the moment.
closing thoughts
NYU Langone’s early and deliberate approach is the main takeaway from this discussion. The metaphor of clean water represents something important. Organizations that invest in filtering dirty data downstream are always playing catch-up. Those who get it right at the transaction level, even if it takes longer and has higher upfront costs, create a foundation on which every subsequent investment from analytics to AI to real-time clinical decision support can be credibly built. In a setting where patient safety is at stake, that discipline is not optional.
To hear from industry leaders and define your path to governing AI, download the Economist enterprise report, “Making AI Deliver.”