Careers - Summit Health Data

In this role you'll define the clinical standard our AI is measured against — building gold-standard datasets, validating model output before it reaches a transplant center, and owning center go-live end to end. You're the clinical conscience of our automation: the person who turns registry reality into labeled truth, catches failure modes before they ship, and earns enough trust that centers stake accreditation on AI-abstracted CIBMTR output.

You'll also own clinical QA for our upcoming hybrid quantum ML (HQML) paths — the same registry-grade bar, structured classical-vs-HQML evaluation, and explicit go/no-go criteria before any change enters production workflows.

This is a hands-on, high-trust founding role at the intersection of clinical AI validation, quality assurance, registry operations, and customer success — one of our first hires and the clinical backbone of the team.

The Team

We're a small, deeply technical founding team turning the high-stakes manual workload of stem cell transplant clinical data abstraction into seamless AI automation — and building the quantum-enhanced analytics and prediction layer on top of structured clinical data. Our co-founders bring years running stem cell transplant clinical data operations and deep expertise in ML and quantum computing. We work with the largest cancer centers that define this field. The decisions our data informs sit behind cutting-edge medical procedures and determine whether patients live.

In This Role You Will

Own clinical gold standards: Build and maintain human-abstracted ground truth across CIBMTR forms — the evaluation set every model release is scored against.
Validate before you ship: Review extraction output against gold labels; adjudicate disagreements; document failure modes and turn every miss into a labeled example that improves the system.
Benchmark hybrid automation: Run structured classical-vs-HQML QA on agreed metrics — per-field accuracy, rare-field recall, auto-fill rate, and review burden — before promoting any path to centers.
Govern center go-live: Lead transplant-center onboarding end to end — kickoff, data mapping, workflow fit, training, and hands-on support that turns skeptical registry teams into advocates.
Translate abstractor reality into product: Convert how registry teams actually work into requirements our engineers and model teams can build against.
Partner on clinical AI governance: Work with founders on intended-use boundaries, escalation paths, rollback triggers, and the quality evidence centers need to adopt automation.
Defend registry-grade quality: Own the edge cases — rare diseases, messy regimens, ambiguous fields — that separate audit-ready CIBMTR data from "close enough."

What We Hope You'll Bring

Hands-on CIBMTR abstraction experience — you've personally completed Forms 2400, 2402, and/or 2450 and know where they get hard.
Deep familiarity with transplant/cellular-therapy clinical data and registry reporting workflows.
A clinical AI / clinical data QA mindset — obsessive about correctness, edge cases, and evidence, not just throughput.
Credibility with center data teams and clinicians; clarity with engineering and product.
Comfort in an early-stage environment: high ownership, little process, things change weekly.

Bonus Points If You Have

Previous NMDP, CIBMTR, or BMT CTN experience.
Experience leading or training a registry data team.
Familiarity with FACT accreditation requirements.
Built or maintained clinical evaluation datasets, annotation workflows, or gold-set regression for NLP/LLM systems.
Exposure to SQL, structured data QA, or annotation platforms — or a real appetite to learn them.
Experience with human-in-the-loop review, champion/challenger testing, or pre-deployment clinical AI validation (a plus in this role, not a hard filter).

To express interest in this role, email careers@summithealthdata.com with your background and the role title in the subject line.

In this role you'll own the system at the heart of Summit: the pipeline that reads messy clinical reality — notes, EMR exports, PDFs — and turns it into structured, registry-grade truth. You'll build the extraction models, the data infrastructure, and the validation loops that make automated abstraction trustworthy enough to stake a clinical decision on. The data you free becomes a proprietary outcomes dataset that doesn't exist anywhere else — and the foundation for everything we build next.

You will also integrate Summit's upcoming hybrid quantum ML (HQML) stacks into these same production workflows — not as a research side project, but as measurable uplift on extraction quality, rare-field detection, and outcomes prediction, with the same clinical-grade evaluation discipline as our classical pipelines.

This is a hands-on, high-leverage role at the intersection of applied ML, clinical data, production systems, and hybrid quantum-classical methods — as one of our first engineers.

The Team

We're a small, deeply technical founding team turning high-stakes manual workload into seamless AI automation — and building the quantum-enhanced analytics and prediction layer on top of structured clinical data. Our co-founders bring years running stem cell transplant clinical data operations and deep expertise in ML and quantum computing. We work with the largest cancer centers that define this field. The decisions our data informs sit behind cutting-edge medical procedures and determine whether patients live.

In This Role You Will

Own the extraction engine: Design, build, and maintain LLM- and NLP-based systems that pull structured, clinically significant fields from unstructured clinical text.
Make it trustworthy: Build the validation, evaluation, and retraining loops that hold extraction to clinical-grade accuracy — because in this domain, the edge cases are the job.
Integrate hybrid quantum ML: Connect upcoming HQML modules (e.g., hybrid quantum-classical feature selection, rare-pattern detection, and prediction layers) into existing ingestion, training, and review workflows — behind the same APIs, monitoring, and human-in-the-loop gates as classical models.
Benchmark honestly: Run structured A/B evaluations (classical baseline vs HQML-augmented paths) on agreed clinical metrics — per-field accuracy, auto-fill rate, review burden, and rare-field recall — before promoting any quantum-assisted component to production.
Integrate at scale: Stand up FHIR/Epic pipelines to ingest clinical data from hospital systems cleanly and reliably.
Handle PHI like it matters: Design data infrastructure that meets HIPAA and SOC 2 requirements from the ground up.
Ship to production: Own your systems end to end — ingestion, deployment, monitoring, iteration — without a big team around you.
Partner with the founders: Translate hard clinical logic and hybrid ML design into robust systems, working directly with clinical and ML leadership.

What We Hope You'll Bring

Strong software engineering fundamentals and 5+ years building ML/data systems that run in production.
Hands-on experience with NLP / LLM-based information extraction from high-stakes unstructured text (clinical, legal, financial, or similar).
Comfort owning a system end to end — ingestion through deployment, monitoring, and rollback.
Real rigor about data quality, evaluation, and edge cases — you treat model changes like clinical software releases.
Cloud data infrastructure experience (AWS and/or GCP).
Interest in integrating hybrid or quantum-classical ML into real pipelines — pragmatic about where it helps, skeptical where it does not.

Bonus Points If You Have

Direct experience with healthcare/clinical data — FHIR, HL7, Epic, Cerner, or clinical NLP.
Experience handling PHI in a HIPAA-compliant environment.
Familiarity with biomedical foundation models (e.g., MedGemma) or clinical-domain LLMs.
Built data pipelines or internal platforms others depend on.
Experience with human-in-the-loop MLOps — evaluation harnesses, reviewer workflows, gold-set regression, or feedback-driven retraining alongside production models (a plus in this role, not a hard filter).
Exposure to hybrid quantum ML frameworks (e.g., PennyLane, Qiskit, TensorFlow Quantum) or quantum-inspired optimization in production ML systems.
Experience shipping model-comparison or champion/challenger workflows in regulated or high-stakes domains.

To express interest in this role, email careers@summithealthdata.com with your background and the role title in the subject line.

A Place to Thrive

Transform Healthcare at Summit Health Data

Open Positions

Clinical AI Quality & Validation Lead

The Team

In This Role You Will

What We Hope You'll Bring

Bonus Points If You Have

Machine Learning Engineer

The Team

In This Role You Will

What We Hope You'll Bring

Bonus Points If You Have