Research · Calyn

If you can’t measure it, you can’t improve it. And if you don’t publish it, no one can hold you to it.

Every quarter, our research lead publishes a report on how Calyn is performing across three axes: outcomes (are people actually feeling better?), safety (are responses adhering to clinical practice?), and equity (does the experience work across cultures and contexts?).

Latest publications

2026 · Q1

User-reported outcomes after 8 weeks of conversation with Calyn

Pre-registered cohort study (n=1,204) measuring PHQ-9 and GAD-7 movement against a waitlist control.

OutcomesPre-registered

2025 · Q4

Safety benchmarking for therapeutic AI: a clinician-authored evaluation suite

We open-sourced the 412-scenario safety eval used to score model updates.

SafetyOpen

2025 · Q3

Cross-cultural calibration of therapeutic conversations

Comparing model responses across eight cultural contexts.

EquityMethodology

2025 · Q2

Crisis detection and warm hand-off: protocol design and live performance

How Calyn detects moments needing escalation, and what happens after.

CrisisProtocols

What we measure

PHQ-9 and GAD-7 movement over 4- and 8-week windows, against a matched waitlist control.
Adherence to clinical protocols across CBT, ACT, IFS, and motivational interviewing.
Safety incident rate, including missed crisis detections (we aim for zero, audited weekly).
Hand-off success, when Calyn refers someone to a human therapist or crisis line.

Open science

We pre-register our outcome studies and publish methodology before we publish results. Our safety evaluation suite is open-sourced for the field. Your conversations are not anonymized and are not shared with outside researchers — they stay linked to your account and are handled under our privacy policy. Get in touch if you want to collaborate.

Evidence over vibes.

Latest publications

What we measure

Open science