How Subsalt Turned Healthcare Data Access from a Compliance Drag into a Research Accelerator

In healthcare, everyone talks about “unlocking the value of data.” In practice, a lot of institutions spend more time trying not to break the rules than actually using the information they already have.

For many academic medical centres and hospitals, the pattern is the same:

  • Researchers wait months for access to clinical data.

  • Compliance teams are swamped with HIPAA, IRB, and internal policy checks.

  • Students are trained on toy datasets that bear little resemblance to the messy, real‑world data they’ll encounter later.

The intention is good—protect patient privacy at all costs. The side effect is a quiet tax on research, innovation, and teaching.

Subsalt, a startup built entirely on Microsoft’s cloud stack, decided to tackle this head‑on with a HIPAA‑compliant synthetic data sandbox that runs natively on Azure and integrates with Microsoft Fabric. One of the clearest examples of what this makes possible is at UT Southwestern Medical Center (UTSW), where data access times dropped from months to days and research cycles sped up dramatically.

The reality: “We’ll get you the data… eventually”

Before Subsalt, UTSW’s researchers faced long delays—sometimes months—just to get access to data for a new project.

Every request had to thread a needle of:

  • HIPAA compliance reviews

  • IRB negotiations

  • Limited, tightly controlled environments for working with real clinical data

Meanwhile, the institution also wanted to give PhD and Health Informatics students hands‑on experience with real clinical data. Traditional de‑identification methods could only go so far: they were slow, manual, and removed a lot of the nuance that makes real‑world data useful.

The result was a frustrating trade‑off:

  • Move slowly and safely with real data.

  • Or move quickly on synthetic or de‑identified data that didn’t reflect reality well enough.

Neither option was good enough in an environment where research funding, publication timelines, and patient impact all depend on speed and accuracy.

Synthetic data as a safe, fast sandbox—not a toy

Subsalt’s answer was to build a HIPAA‑compliant synthetic data sandbox on Azure. The platform uses Microsoft’s trusted cloud to generate production‑grade synthetic datasets that preserve the statistical properties of real clinical data—without exposing actual patient information.

Because it’s built natively on Azure and integrates with services like Azure Kubernetes Service (AKS), Azure Storage, Azure Data Factory, Microsoft Entra ID, and Microsoft Fabric, Subsalt can live inside the environments healthcare organisations already trust.

In practice, that means:

  • Researchers and students can work with high‑value health data that behaves like real clinical data but doesn’t expose PHI.

  • Teams can test hypotheses, prototype models, and run feasibility studies in hours instead of months.

  • Compliance and data teams can minimise data movement by sharing synthetic datasets instead of duplicating and shipping copies of source data around.

Fabric‑ready integration takes it a step further: synthetic datasets can be generated directly from data already managed in Microsoft Fabric, plugging into an existing data estate rather than sitting on an island.

What changed at UT Southwestern

When UTSW deployed Subsalt on Azure, the day‑to‑day experience for researchers and students started to look very different.

Instead of:

  • Waiting months for approvals and environment setup,

  • Negotiating each data request from scratch,

  • And working in a few limited sandboxes,

they can now:

  • Get access to appropriate synthetic datasets in days.

  • Explore ideas, refine study designs, and test models long before they touch live clinical data.

  • Broaden access to data across more researchers and students without increasing privacy risk.

According to the institution’s Chief Data Officer, the breakthrough is being able to “minimize data movement while making access permissive and fast,” so they can accelerate the research that advances their mission.

Quantitatively, UTSW has:

  • Cut data access time from months to days.

  • Accelerated some clinical research outcomes by up to 50x.

  • Guaranteed HIPAA and internal compliance through the way the platform is architected.

That combination—speed, safety, and scale—changes how an academic medical centre can think about data‑driven research and training.

Why platform choice matters for healthcare data products

Subsalt’s story isn’t just about synthetic data. It’s also about platform design in a regulated world.

For solution providers, choosing a cloud platform that is HIPAA‑compliant, trusted by customers, and easy to integrate into existing environments is critical. By building on Azure and aligning closely with Microsoft’s stack, Subsalt can:

  • Deploy inside customer environments via AKS, reducing compliance risk and simplifying onboarding.

  • Plug into existing identity, storage, and data pipelines instead of forcing institutions to stand up new infrastructure.

  • Offer Fabric‑ready integration that lets customers generate synthetic datasets from the same data they already manage and govern centrally.

That tight integration is part of why Subsalt has been able to scale its impact quickly, supported by programs like Microsoft for Startups Pegasus, which open doors into strategic enterprise accounts and co‑sell motions.

But for healthcare organisations, the key point is simpler: they get a safe, fast way to use more of their data for research and teaching, without compromising patient privacy.

From “can we use this data?” to “what can we learn from this data?”

The deeper shift here is cultural.

When data access is slow and fraught, the default posture in many institutions becomes defensive. People ask, “Can we even use this data?” and projects stall before they really start.

With a synthetic data sandbox approach, the question becomes, “What can we learn from this data?” because teams have a safer environment to explore ideas before they hit regulatory and operational constraints.

That doesn’t eliminate the need for strict governance when working with real clinical data. It simply means a lot more of the early‑stage thinking, modelling, and iteration can happen without putting PHI at risk.

For organisations under pressure to do more with AI, faster, without breaking privacy commitments, that’s a meaningful shift.

If your research teams are waiting months for data, what would you change?

If parts of UTSW’s story resonate—long waits for data, slow de‑identification processes, limited environments for students, constant tension between speed and compliance—you’re not alone. Many healthcare and life sciences organisations are in the same position.

Onyx Data works with data and research leaders in regulated industries to tackle exactly this kind of challenge: designing architectures and workflows—often on Fabric and Azure—that make it possible to move faster without loosening your grip on privacy and compliance.

The work often starts with a straightforward question: “If we could safely create a synthetic or sandboxed version of our most valuable data, where would it unlock the most progress for us—in research, teaching, or AI?”

If you’d like to explore what that might look like in your organisation, complete the short form below. Share your role, your current data platform landscape, and the one bottleneck that slows research or analytics the most. From there, Onyx will suggest a tailored, no‑obligation starting point you can take back to your team.

Generic contact sales

By submitting this form, you agree to receive access to the requested content and relevant communications from Onyx Data.

Your information will be handled in accordance with GDPR and CCPA regulations. You may update your preferences or opt out at any time.

View our Privacy Policy.

Recent Articles

Blog Blog Blog