Does ChatGPT think you’re Welsh too?


Stay informed with free updates

Welsh is the oldest living language in Great Britain — a branch of ancient Celtic that has survived its English neighbour’s best efforts to kill it off. Despite their proximity, the two languages are pretty much mutually unintelligible. And yet for much of the past year OpenAI’s chatbot has insisted on talking to me in Welsh.

If you are a ChatGPT user who hasn’t encountered this problem it may be because you haven’t made the switch from typing prompts to saying them out loud. Once you do it’s hard to go back. AI transcription is generally excellent. It’s just that every so often mine would start spooling out Welsh. “I am not speaking Welsh,” I would say. “Dwi ddim yn siarad Cymraeg,” it would write. 

OpenAI’s explanation for why this happened was that Whisper, its speech-to-text model, sometimes got confused. But who confuses English, the world’s most widely used language, for Welsh? Adding to the mystery was the fact that it wasn’t mishearing homophonic words but translating them. A blog for developers suggested a conspiracy (“yet another example of the socialist Welsh government pushing its ideology”). But that was scuppered by users who reported the same thing happening in Malay and Icelandic. 

OpenAI has known about this issue for over a year, according to FT reporting. This is proof of how difficult it is to develop conversational voice agents. If you remember unsuccessfully yelling “Alexa, VOLUME UP” when smart speakers were released a decade ago, you’ll know this. Anything less than perfect conditions — background noise, an accent, overlapping speech or an unusual request — raises the chance of error.

Adding to the problem is the fact that high-quality recorded datasets are harder to come by than text ones and processing times are higher. That leads to another issue: our acute sensitivity to mis-steps in speech. An extra few milliseconds of silence between one person speaking and another replying is all it takes to make us feel uncomfortable. 

Sweeping those concerns aside, the tech sector is certain that voice is the next frontier in AI — the “de facto interface”. No more looking down and tap, tap, tapping on your phone, the future is hands-free. Smart speakers, smart glasses, smart pins, smart rings — anything and everything can become an interface for natural language conversations. It’s what iPhone designer Jony Ive is betting on with his mystery OpenAI device, something we should get to see later this year.

OpenAI co-founder Sam Altman has described the vibe of this device as “sitting in the most beautiful cabin by a lake and in the mountains”. What does that mean? No one knows. It could turn out to take the form of a lamp, a pebble, a clip or a pair of earphones, depending on which rumour you listen to. But whatever it looks like, it’s expected to be directed by audio, not touch screen.

You can see the focus on voice in other deals taking place in the sector. Last summer, Meta bought Play AI, a start-up that specialises in conversational voice models. Google recently hired the founder of start-up Hume, known for its work analysing vocal emotions. Apple has bought Q.ai, an Israeli start-up that tracks facial muscles when you speak, meaning it can understand what you say even if you can’t be heard. Everywhere we go — at the office, in factories, cars, hospitals, schools and homes — we are being manoeuvred into having spoken conversations with software. 

Of course, this is the point at which random translations become alarming. If I’m sitting on my sofa and a chatbot switches to a language I can’t understand, then it’s annoying. If I’m trying to communicate with a robotic surgeon or an autonomous car barrelling down the road at 70 miles per hour, it’s terrifying.

But voice-to-text software is improving at a dramatic pace. Researchers measure accuracy in word error rate percentages. On the open source automatic speech recognition leaderboard, OpenAI’s Whisper has a rate of 7.44 per cent (0 per cent is perfect). That’s down from over 8 per cent a few months ago. At the top is Nvidia’s Canary-Qwen-2.5B, with a score of 5.63 per cent. 

OpenAI also says the Welsh problem should have been fixed with its latest model update. The issue was mislabelled data — a weak link in a sophisticated chain that has taken some time to untangle.

Talking is a more natural way to communicate than typing. And the more the tech is employed, the more quickly its flaws can be identified and fixed. We’ve been hunched over smartphones for nearly two decades now. Voice mode is a step forward — so long as it’s in the right language.

elaine.moore@ft.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top