
Paolina White
Senior Director, Speechmatics.
One AI speech technology provider is building transcription engines that capture every voice, highlighting the advantage of diversity in forcing us to build better systems.
Speechmatics Senior Director Paolina White discusses the importance of ensuring voice AI is designed to handle a wide range of speakers, accents and environments.
What’s possible when voice AI works well, what’s needed to make it happen?
When voice AI is designed to recognise and accurately capture a wide range of speakers, accents, and environments, admin time can be reduced by almost 2-3 hours daily, and patients receive more sympathetic consultations, but solutions must be built to perform under pressure. Models may perform well in controlled labs, but in real conditions, they must contend with noise, accents, interruptions, and overlapping conversations between doctors and patients.
Can voice be seen as a “stress test” for NHS AI readiness, what issues could AI encounter in practice?
Absolutely. When the NHS paused unsafe transcription deployments this year, it highlighted how complex the challenge is. If the speech engine being used to handle AI note-taking or dictation is unable to handle background noise, distinguish near-field from far-field speech or manage fast-switching speakers, the workflow can collapse, leading to frustration and manual intervention. For this reason, we actively engage with leading AI scribe companies to evaluate and refine performance in real-world environments.
For this reason, we actively engage with leading AI scribe companies to evaluate and refine performance in real-world environments.
Do accents, dialects and multilingual conversations affect performance, why should AI be able to distinguish multiple voices?
It’s widely recognised that accents and dialects can trip up speech engines, leading to mistranscription of medical terminology, and exposing poor technological foundations. At Speechmatics we train voice AI on diverse datasets, ensuring the software can accurately separate and tag every voice in the room. Diarisation errors can misattribute statements, and in healthcare, even a single symptom or misattributed allergy can create real clinical risk.
What should NHS trusts consider prior to deployment?
NHS trusts must ask for proof of live deployment in real NHS conditions, in noisy wards, with diverse accents and multiple speakers, because compliance accreditation isn’t always reflective of real-world use. Diversity pushes industry to build more inclusive, globally relevant systems. Getting voice technology right means building AI that works for everyone, not creating new layers of inequality.
