About


Mission

Our tools connect everyday people to clinical trial findings.

No personal health data stored here

We don’t store anything you paste. Submissions run through the evidence layer in the moment and aren’t persisted, logged against an identity, or sent anywhere a third party can re-identify you. No account is required to try it.

No medical advice

We don’t tell you what to do. We surface what clinical trials actually said about people in situations like yours — and where the evidence runs out — so you and your clinician can decide. This is not a substitute for clinical care.

Think Carfax for used cars — but for the medical decisions you’re about to make.

No personal health data stored here

We don’t store anything you paste. Submissions run through the evidence layer in the moment and aren’t persisted, logged against an identity, or sent anywhere a third party can re-identify you. No account is required to try it.

No medical advice

We don’t tell you what to do. We surface what clinical trials actually said about people in situations like yours — and where the evidence runs out — so you and your clinician can decide. This is not a substitute for clinical care.

Think Carfax for used cars — but for the medical decisions you’re about to make.

Why not use ChatGPT directly?

OpenAI grades every flagship release against its own HealthBench medical benchmark — criteria written by 262 physicians across roughly 48,000 rubric points. As of 2026, the picture across their published numbers:

Model HealthBench variant Score
GPT-5HealthBench Hard46.2%
GPT-5.5HealthBench Hard31.5%
GPT-5.5HealthBench Consensus95.6%
GPT-5.5HealthBench Professional51.8%
ChatGPT for Clinicians (GPT-5.4)HealthBench Professional59.0%
Physicians (baseline)HealthBench Professional43.7%

Original May 2025 paper used a single rubric-satisfaction rate: o3 = 60%, GPT-4.1 = 48%, o1 = 42%, GPT-4o = 32%, GPT-3.5 = 16%. HealthBench has since been split into the Hard / Consensus / Professional variants above.

OpenAI’s headline medical claim in 2026 is that ChatGPT for Clinicians beats physicians on HealthBench Professional (59.0 vs 43.7). That number is graded against the benchmark we audit — so the marquee medical-AI claim of the year rides on a scoring rubric whose reliability is itself measurable.

In our audit of HealthBench’s doctor-written gold answers themselves, we’ve found decision-changing errors in roughly 3% of claims in the first 110 audited (3 findings). A fourth triple-source-verified fabrication was added 2026-05-29 but is not yet reflected in that count. Not only can the AI players be wrong — even the doctors writing the benchmark can be wrong, and those errors propagate into every model graded against it.

Sources: HealthBench paper (May 2025) · HealthBench Professional paper · OpenAI: Introducing GPT-5 · OpenAI: Introducing GPT-5.5 · GPT-5.5 System Card · TechRepublic: GPT-5 medical benchmarks · Vellum: GPT-5 benchmarks · BenchLM: GPT-5.5 benchmarks 2026

How is our approach different?

  1. We don’t give advice. We focus on filling in the gaps of what doctors and medical AI seem to miss — the fine print around clinical findings.
  2. Structured queries, not paraphrasing. ChatGPT probabilistically summarizes what it remembers. We extract the trial fine print — eligibility, subgroups, dose, comparator, outcomes — into typed data the system queries directly per question.
  3. We save you the multi-hour ChatGPT rabbit hole. Producing a defensible pre-visit summary by hand means dozens of follow-up prompts, cross-checking citations, and reading abstracts. Our system runs that audit for you in minutes, not hours.
  4. We extract from full-text trials, not just abstracts. ChatGPT can only paraphrase what its training corpus contained — mostly abstracts and freely-crawlable summaries. Most of the fine print that determines whether a study applies to you lives in the methods, eligibility tables, and supplementary material of full-text articles, which we extract directly under the PMC Open Access carve-out.

What’s this all about?

What

Doctors and patients make decisions using medical AI claims based on broad summaries of clinical-trial studies.

See the Medical AI landscape your doctor uses →

Why

Broad medical AI summaries run the risk of overgeneralizing and overlooking granular details of clinical-trial findings that are pertinent to your unique personal context.

Read about the Evidence-to-Person Fit problem →

How

We run deterministic queries to cross-check clinical-trial claims against your personal context, composed via AI elicitation.

See the after-visit summary audit landscape →

Industry signals

Why this matters now — two recent peer-reviewed studies on how clinicians and patients are already using medical AI:

Clinician-side: OpenEvidence accounted for 98.7% of searches across leading AI-enabled clinical reference tools, with traffic rising to ~1.59 million visits/month by June 2025.1

Patient-side: In a study of 617,827 Microsoft Copilot conversations, roughly 1 in 5 involved personal symptom assessment or condition discussion. Microsoft explicitly notes that benchmark performance does not predict real-world reliability for high-stakes health questions.2

References

1  Patel VR, Liu M, Jena AB. Public Interest in an AI-Enabled Clinical Decision Support Tool. JAMA Network Open, Nov 20, 2025.

2  Costa-Gomes B, Tolmachev P, et al. (Microsoft AI). Public use of a generalist LLM chatbot for health queries. Nature Health, April 16, 2026.

Does this service exist anywhere else?

Not directly. Adjacent services handle pieces of the workflow — clinician-side evidence Q&A, patient-side AI triage, trial matching for enrollment — but none combine patient-supplied health data, a structured clinical-trial corpus, and a personalized applicability audit.

Service What they do What they’re missing
OpenEvidence, UpToDate AI, AMBOSS AI, ReachRx Clinician-side evidence Q&A No patient-data ingestion, no applicability layer
Glass Health Clinician CDS with FHIR context Clinician-only — not patient-side audit
Hippocratic AI, Ada, K Health Patient-facing AI agents (triage, intake, care management) No structured trial backend
Deep 6 AI, TrialFit, TrialMatchAI Trial matching for enrollment Opposite direction — get into trials, not audit existing care
ChatGPT, Claude, Perplexity, Consensus, Elicit General AI medical Q&A / paper search No structured trial extraction, no patient-data ingestion
Cleveland Clinic Express Care, Mayo Clinic 2nd Opinion Human clinician second opinions Human-mediated, expensive, not data-driven

Closest in shape: Glass Health (clinician-side).  Closest in audience: Hippocratic AI (B2B health-system patient agents).  What’s empty: the patient-side evidence-audit lane.

Read more on the Blog

Long-form essays on the problem we’re working on, the medical AI landscape, and the open-source tooling we ship for builders:

Open-source: evidence-to-person-eval on GitHub — eval-driven design of FHIR Evidence representations, aligned with EBMonFHIR. v0.0.3, design phase.

Team

Lisa DeMeyere
Lisa DeMeyere
MSN, ACNP-BC

Transplant nurse practitioner. Keeps us simple and clear.

LinkedIn
Jessica Johnson
Jessica Johnson
Microbiology (MS), Science Journalism (MS)

Science writer and breast cancer survivor. Keeps us creative and grounded.

LinkedIn
Boris Dev
Boris Dev

Building the system. Available for consulting.

LinkedIn

No B.S. Med retrieves and structures clinical-study evidence. It does not diagnose, prescribe, or replace professional medical judgment. Users should consult a qualified healthcare professional before making medical decisions.