Measuring Wellness Through Voice with Canary Speech
AI
In this episode of the Radian Podcast, we explore how AI-powered voice analysis, like Canary Speech, helps clinicians detect health issues—enhancing patient care and uncovering what might go unnoticed.
Key Takeaways
Canary Speech uses AI to detect health conditions—like anxiety, Alzheimer’s, and Parkinson’s—through voice patterns alone.
Their technology analyzes how we speak, not what we say—unlocking over 12 million data points in just 40 seconds of voice.
Co-founder Jeff Adams previously led development on Dragon NaturallySpeaking and Amazon Echo.
The platform is already demonstrating 94–98% accuracy in identifying neurological conditions through clinical validation with partners like Harvard and the NIH.
Canary empowers clinicians with real-time, objective data, acting as a non-invasive decision-support tool.
It's designed to help underserved populations, rural communities, and overburdened healthcare systems by offering insights remotely and at scale.
Voice is the second-most complex signal in the body (after the genome), making it a powerful diagnostic source.
The company was founded with a mission-first mindset—emphasizing empathy, access, and ethics in the use of AI.
Maxwell Murray:
Hi Henry—so good to see you again. Welcome to the podcast, everyone. I'm really excited to introduce you to Henry O’Connell, CEO and co-founder of Canary Speech. It's a fascinating platform. I can’t wait to talk about it.
Henry, I want to give you the opportunity to introduce yourself and say hello to the people.
Henry O’Connell:
Thank you so much, Max. It's really enjoyable to be here with you. I’m excited to talk—as I always am—about Canary Speech. Hopefully, we can discuss some of the things people don’t know about Canary yet.
Maxwell Murray:
Oh man, I can’t wait to share this with the audience. I’m really excited for this episode. Let’s start with the big picture. Give us the elevator pitch—what is Canary Speech, and what role do you see it playing in the future of healthcare?
Henry O’Connell:
Back at HLTH 2024, just a few months ago, we introduced something we call Canary Ambient. That was made possible by some significant enhancements to our APIs.
Our API is basically the connection handshake to other platforms. It allows us to capture audio in a streaming mode. So, if you're using a platform like an avatar that talks to a patient, or a DAX copilot doing transcription and notes, that audio can also be analyzed for health conditions.
Our API is now smart enough to listen to both voices, identify who’s speaking—whether it’s the patient or the doctor—separate them, and process the entire conversation in real time. It breaks the audio into 10-millisecond slices and validates what it hears throughout the conversation. Not just snapshots, but continuous analysis.
And when it has enough information, it generates a score—instantly—and returns that score within milliseconds to the device the clinical team is using.
Henry O’Connell:
Canary Speech began about eight years ago. Jeff Adams and I—who’ve been friends for about 40 years—were in a bagel shop down the street here in Provo, Utah. He was speaking at Brigham Young University, where he had graduated years earlier, and we figured we’d get together for lunch.
We ended up spending about eight hours there. Totally unplanned.
Jeff and I originally met when I was working at the National Institutes of Health, doing research on neurological diseases. Jeff was at the NSA, building mathematical models to decrypt spy messages during the Cold War. That was his dream job. I used to joke that he’d still be there, handcuffed to the desk, just doing it.
Maxwell Murray:
That’s amazing.
Henry O’Connell:
Yeah. We were both in those jobs for about five years. I moved into a business career, and Jeff went on to work with Ray Kurzweil. Ray wanted to develop what’s now called NLP—natural language processing. The ability to listen to speech and turn it into words.
Jeff had the skillset to do that. He built the first commercially available NLP product. That naturally evolved into Dragon NaturallySpeaking, which became the most popular transcription product for doctors.
Maxwell Murray:
I remember that. A lot of clinicians still rely on Dragon.
Henry O’Connell:
Exactly. Jeff also helped launch legal dictation tools and diarization—the separation of voices in an audio stream. He worked across a whole range of speech technologies.
Then, about 14 years ago, Amazon was looking for a team to build what would become the Amazon Echo. They had a great idea, but they didn’t know how to do it. So they bought the company Jeff was with. That gave them Jeff, 17 speech and language scientists, and Jeff O’Neill, a patent lawyer.
Jeff went on to lead the team, and three years later, they launched the Amazon Echo.
Maxwell Murray:
That’s a pretty serious résumé.
Henry O’Connell:
Yeah. He had this incredible arc—from medical transcription to consumer voice technology at scale. And by the time we were sitting in that bagel shop, he was past his severance obligations and free to do something new.
I asked him, “Jeff, you could do anything right now. What do you want to do?”
He said, “I’ve always wanted to apply speech analysis to human condition and disease. To make a difference.”
So I said, “Why don’t we do that?”
Henry O’Connell:
That’s how Canary Speech got started.
One of my daughters actually came up with the name. She thought of the idea of a canary in a coal mine—an early warning system. We liked it. We could get a trademark for it. And the little bird in our logo? She drew that.
When we got the trademark back from the patent office, it was in her name. She assigned it to the company. She was still pretty young at the time, but it was a cool moment for her—and for all of us.
Maxwell Murray:
That’s such a good story. I love that this came from a long-term relationship—two friends asking what they can build together. And the fact that it’s rooted in something bigger than just the tech.
Henry O’Connell:
Canary Speech was really founded on the principle that healthcare should be accessible to people no matter where they live. Smartphones are everywhere. Voice is everywhere.
People can connect with medical teams remotely, but those teams might not know much about their condition. Voice bridges that gap.
It’s the most complex motor function and data set the body produces—second only to the genome. We analyze over 15 million data elements in a single minute of speech.
If any disease affects the systems involved in speech—vocal cords, tongue, cheeks, surrounding musculature—it shows up. With something like Parkinson’s, those muscles begin to deteriorate early on. That leads to what’s known as the Parkinson’s draw. And we can detect it from just 40 seconds of speech.
Maxwell Murray:
That’s incredible. So it’s not just mental health—it’s neurological, physical, everything that touches the voice?
Henry O’Connell:
Exactly. We can also identify where someone’s anxiety or stress levels are. And we can distinguish between mild cognitive impairment and Alzheimer's disease. And it doesn’t matter what the person is saying—we’re working at a level below language.
We focus on how the central nervous system generates speech. Not the content of the words, but the mechanics of how they’re formed. That’s where the subtle patterns emerge—and those are the patterns that machine learning can pick up on.
Our goal was to build something non-invasive, easy to use, and deeply informative. A tool that could live in a physician’s toolbox and work whether they’re in a clinic, doing a telemedicine call, or holding a tablet in a remote location.
Henry O’Connell:
Years ago, when Jeff built NLP, a lot of very smart people tried to use it for diagnostics. They’d extract the words, create a transcript, and then analyze it for indicators like “anxious” or “depressed.”
That works a little bit—but it has a lot of limitations. There are too many variables you can’t control.
You and I might be equally educated, but we use different language. Even within the same language, we use different words to express emotion. What might be a red flag phrase for me may never show up in your vocabulary—even if you’re feeling the same thing.
So those models never became practical diagnostic tools.
Maxwell Murray:
That makes a lot of sense. There are cultural layers to language, personality layers, tone—it all varies. The same words don’t mean the same thing across people.
Henry O’Connell:
Exactly. That’s why we don’t analyze what people say. We look at how they say it.
Maxwell Murray:
So let me check my understanding. It’s not that someone says, “I’m fine,” and you’re analyzing those exact words. It’s the tone, rhythm, and energy in how they say it—that’s what gives you the insight.
Henry O’Connell:
Yes. We analyze 2,548 data elements every 10 milliseconds. Over 40 seconds, that adds up to 12.5 million data points.
Machine learning loves large datasets. If we were just using words, we might get 100 words in 40 seconds. Maybe 200 syllables. Even if we count pauses or filler sounds, we might get 800 data elements. That’s nothing.
But when we look at the full range of motor and acoustic characteristics in speech, we get millions of data points. That’s a dataset machine learning can actually work with.
Henry O’Connell:
Take something like vocal cords. They’re controlled by the central nervous system. When I get excited, my pace naturally picks up. My tone might rise. I’m not thinking about it—but my body’s reacting. That’s neurological.
What’s interesting is that same vocal change could be a parent getting excited about their kid scoring a goal on the soccer field—or a screech from a pedestrian trying to stop someone from stepping into traffic. Those sounds aren’t words, but they carry meaning.
I told Jeff, “I know the difference between those two screams. I can feel it. But how would a machine learning tool know that?”
That was one of the core challenges we wanted to solve: could we teach a system to detect those distinctions and translate them into meaningful health signals?
Maxwell Murray:
That’s such a powerful frame. You’re not trying to understand the words—you’re trying to understand the intention or signal beneath them.
Henry O’Connell:
Exactly. And if we could build a tool that gave clinicians that kind of information—across borders, across accents, across contexts—it could change how healthcare is delivered.
We’re doing work in rural Australia right now, for example. Farmers who go months without seeing a doctor. The government provides care, but it’s spread thin.
We’ve also worked with the gypsy population in rural Northern Ireland—an underserved group where cultural and geographic barriers make access really hard.
This gives them a tool. One they can use wherever they are. Something that helps generate objective, data-driven decisions.
Henry O’Connell:
I first really noticed the emotional signal in voice with my daughter Caitlin, when she was playing soccer. She’d come home after school, walk across the room, and I’d just know from the way she moved—good day or bad day.
I’d ask, “How was your day?” And she’d say “fine,” but I already knew it wasn’t. So I’d say, “Come talk to me—was it the coach? Chemistry class?”
We’d sit, she’d talk, and her voice would tell me everything. Even now, years later—she’s married, has a kid—I’ll call her on the phone. No physical cues, no body language, no stride across the room.
But within seconds, I know. I can tell if it’s a good day or a hard day.
Maxwell Murray:
Oh yeah. My mom’s the same way. I’ll give her a “yeah, I’m good,” and she’ll drill in: “No, no—what’s going on?”
Henry O’Connell:
Exactly. Same with babies. Parents can tell the difference between a hungry cry and a tired cry. It’s not instinct. It’s learned pattern recognition.
What we think of as intuition is actually the brain learning to read micro-signals. Our machine learning system is trying to replicate that—not by understanding language, but by identifying the patterns that precede it.
Henry O’Connell:
I started wondering—is this just something between me and my daughter? But the truth is, we all do this with each other. Even with people we don’t know that well.
I always tell people about the elevator moment. You get in, and you just feel that someone’s not having a great day. So I’ll say, “How’s your day going?” and they’ll reply, “It’s fine.”
But it doesn’t sound fine. So I’ll say, “You want to talk about it? We’re about to go our separate ways, you’ll never see me again.”
That’s what we wanted to capture with Canary—those signals. And I remember sitting in that bagel shop with Jeff, and I asked, “How am I doing that? How do I know all of this just from hearing someone’s voice?”
And Jeff said, “You know, I don’t know.”
Now, remember—this is the guy who built NLP. Who created Dragon. Who helped launch the Amazon Echo. When someone like that says “I don’t know,” it means we’re asking a question that hasn’t been answered yet.
That was the moment we realized it’s not about words. It’s not about reading body language. It’s something deeper. And if it’s measurable, and if it’s accurate—we can use it to help people.
Maxwell Murray:
And you’re applying that same intuition to healthcare. Instead of a parent reading their kid’s tone, it’s a clinician reading a patient’s nervous system—without relying on language or subjective observation.
Henry O’Connell:
Exactly. And Jeff is the kind of person who, when everyone says, “It can’t be done,” he’s the guy who says, “Let’s do it anyway.”
That’s what happened with Canary. We realized, if we can do this, it’s going to change how people receive healthcare. It’s going to help people across the world. So we committed to building it—starting right there in that conversation.
Maxwell Murray:
So you’ve gone from that bagel shop conversation to building something clinicians actually use. What were the first signs that this tool could work in a medical setting?
Henry O’Connell:
At first, we built the core technology. But we knew we needed audio tied to real diagnoses—data connected to clinical expertise.
Some organizations already had audio paired with diagnoses. We partnered with them and got to work. Within the first year, we were able to demonstrate Alzheimer’s detection. We built models that could identify diagnosed Alzheimer’s patients with 94–96% accuracy in conversation alone.
From there, we started working with clinical teams—Harvard Beth Israel, Hackensack Meridian, Intermountain Healthcare, Tallaght University Hospital in Ireland, and the National Institutes of Health in Japan.
They had the diagnostic rigor, and we had the technology. We captured the audio, built models, and then came back to validate the correlations under peer review and clinical settings.
Everything we’ve built has gone through that process—IRB-reviewed, designed with sensitivity in mind, aligned with AI for good. That’s always been our focus.
Henry O’Connell:
The patients we worked with were incredibly engaged. When a doctor says, “We’re going to analyze your voice and it might help us better understand your disease,” they’re intrigued. They’ll say, “Really?”
One of the earliest studies we did was with the neurology team at Harvard Beth Israel, focusing on Huntington’s disease. When we returned with the model, it had a 98% accuracy rate.
Everyone was kind of stunned.
So we laid all the data out—on screens, in spreadsheets—and worked through it together. One concern in speech AI is overfitting. That’s when a model gets too good at the training data and fails to generalize.
There are four or five different ways to test for that, and we ran them all. We weren’t overfitting.
What we found instead was something deeper. When you’re dealing with progressive neurological disease, it affects the central nervous system and all the systems involved in creating speech.
There are a half dozen different parts of your body that have to coordinate like instruments in an orchestra to produce speech correctly. When there’s a deficit—like in Parkinson’s or Huntington’s—your body starts compensating.
The brain might reroute control to other muscles. A person might subconsciously push harder with their vocal cords to be heard. These compensations create subtle, but consistent patterns.
And those patterns are what we detect.
Maxwell Murray:
It’s powerful. Someone close to me was recently diagnosed with Parkinson’s, and it took a while to get there. There was a lot of uncertainty. So to hear that there’s a tool like this, capable of spotting those patterns early—that’s huge.
And it sounds like what you’re doing isn’t just theoretical. It’s peer-reviewed, clinically tested.
Henry O’Connell:
That’s right. We see this as a companion tool for neurologists. These are highly trained people, with years of experience. They’ve seen hundreds of patients across various conditions.
But even then, they might not be sure what they’re looking at—especially in early stages. So they refer to a specialist. That specialist runs subjective tests, maybe gets an MRI. But we’re offering something objective.
What we analyze isn’t consciously controlled. It’s not something a person can fake. These changes happen at the subconscious level, governed by the central nervous system, in milliseconds.
It’s just another tool in the toolbox—something to guide the next question, the next decision, the right referral.
Henry O’Connell:
And it’s not just Parkinson’s. We can detect MS, Huntington’s, and even preclinical stages of Alzheimer’s.
We’re able to tell the difference between someone with normal cognitive function and someone with mild cognitive impairment. That might be two to five years before they receive a diagnosis of Alzheimer’s.
We can see the journey a person is taking—before symptoms are obvious. That gives clinicians time to intervene, plan, and support that person in a meaningful way.
In Huntington’s, we can identify the transition from pre-manifest to manifest disease with 98% accuracy. That’s when symptoms first start showing up—and knowing when that’s happening matters. It shapes how care is delivered.
Henry O’Connell:
While we’re analyzing for neurological signals, we’re also detecting anxiety and depression in real time. It’s the same voice sample—we’re just looking at multiple layers of data.
A neurologist may not be trained to spot those emotional signals. But if we can surface them, that clinician might decide to refer the patient to a psychiatrist or therapist.
Because sometimes the person knows what they’re facing. They’re aware they’re on the verge of a life-altering diagnosis, and they’re struggling. That emotional layer can’t be ignored.
Maxwell Murray:
It’s amazing to think about that dual insight—neurological and emotional—coming from the same voice input. And it makes sense. Mental health always comes up in these conversations, no matter what the original condition is.
I’ve been learning recently that for Parkinson’s, trauma or anxiety can actually accelerate the disease. Emotional stress literally affects disease progression.
Henry O’Connell:
That’s right. And it’s not just Parkinson’s. It’s true across a range of conditions. Physical and mental health are deeply intertwined.
Even a broken arm can lead to depression, which then influences recovery and long-term outcomes. That’s why we see this tool as holistic—mind and body.
Maxwell Murray:
And if you can surface that information in a way that’s objective, measurable, and actionable—it changes how clinicians can show up for their patients.
Henry O’Connell:
Exactly. What we’re building is technically referred to as a Clinical Decision Support Tool. Something that helps doctors make more informed decisions.
They already pull together a huge amount of information: symptoms, history, labs, imaging, intuition. This is just another input—one that’s passive, objective, and non-invasive.
We’re not asking the patient to fill out a form or describe how they feel. We’re measuring things they’re not even aware of.
Henry O’Connell:
Let me give you an example.
There’s a clinic we work with that requires annual checkups for employment. One of their team members—also a clinician—came in for their checkup. They opted out of the standard paper assessments like the GAD-7 or PHQ-9, which are self-reported.
But they did the Canary voice assessment.
When their colleague reviewed the results, he saw that their depression level was as high as it gets. He asked, “What’s going on?”
And the person said, “I’m fine. The medication I was on wasn’t working, so I stopped it a few weeks ago. I think I’m managing okay.”
But of course, if you’ve been on a medication for years and suddenly stop, there’s a withdrawal period. Your system reacts.
The clinician said, “Let’s talk about this. Doctor-to-doctor. Friend-to-friend.” And they worked through it. There was no intent to hide anything. It was more like, “I’ve got this, I can manage it myself.”
But that’s not always the best approach. And Canary gave them a signal that said, “Actually, something’s going on here.”